FairPrice

Posted on Mar 7 • Edited on Mar 24 • Originally published at apify.com

How to Scrape Bluesky Posts and Profiles at Scale (No API Key Needed)

#bluesky #webscraping #api #tutorial

Why Scrape Bluesky?

Bluesky has grown to over 30 million users, built on the open AT Protocol. Unlike X/Twitter which has locked down its API behind expensive tiers, Bluesky's public API is completely free to use — no API key, no OAuth, no rate limit headaches.

This makes it a goldmine for:

Social media monitoring — track brand mentions and sentiment
Lead generation — find people discussing topics relevant to your business
Market research — analyze conversations and trending topics
Academic research — study public discourse on decentralized social media

In this tutorial, I'll show you how to scrape Bluesky posts and profiles at scale using an Apify actor I built.

The Easy Way: Use the Apify Actor

I published a free Bluesky Scraper on Apify that handles everything — pagination, data extraction, and export. Here's how to use it.

Step 1: Set Up Your Input

The actor accepts a simple JSON input:

{
    "searchQuery": "artificial intelligence",
    "handles": ["jay.bsky.team", "bsky.app"],
    "maxItems": 200,
    "scrapeType": "both"
}

Parameters:

searchQuery — keyword to search posts or profiles for
handles — specific Bluesky handles to scrape
maxItems — maximum results (up to 10,000)
scrapeType — posts, profiles, or both

Step 2: Run It

You can run it from the Apify Console UI, or programmatically:

JavaScript:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('cryptosignals/bluesky-scraper').call({
    searchQuery: 'startup funding',
    maxItems: 500,
    scrapeType: 'posts'
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Got ${items.length} posts`);

Python:

from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/bluesky-scraper').call(run_input={
    'searchQuery': 'startup funding',
    'maxItems': 500,
    'scrapeType': 'posts'
})

items = client.dataset(run['defaultDatasetId']).list_items().items
print(f'Got {len(items)} posts')

Step 3: Export Your Data

Every run outputs structured data you can export as JSON, CSV, or Excel directly from Apify.

Post data includes:

Author handle, display name, and DID
Full post text
Engagement metrics: likes, reposts, replies, quotes, bookmarks
Timestamps (created and indexed)
Direct URL to the post
Embedded images and external links
Language detection

Profile data includes:

Handle, display name, bio
Follower/following/post counts
Avatar and banner URLs
Account creation date

The DIY Way: AT Protocol API Basics

If you want to build your own scraper, here are the key endpoints. The AT Protocol public API at https://public.api.bsky.app/xrpc requires no authentication:

Search Posts

curl "https://public.api.bsky.app/xrpc/app.bsky.feed.searchPosts?q=AI&limit=25"

Get User Profile

curl "https://public.api.bsky.app/xrpc/app.bsky.actor.getProfile?actor=jay.bsky.team"

Get Author Feed

curl "https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=bsky.app&limit=50"

Search Users

curl "https://public.api.bsky.app/xrpc/app.bsky.actor.searchActors?q=developer&limit=25"

All endpoints return JSON and support cursor-based pagination. The cursor field in the response tells you how to fetch the next page.

Tips for Scraping at Scale

Respect rate limits. The public API is generous but not unlimited. Add small delays between requests if scraping thousands of items.
Use pagination cursors. Don't try to get everything in one request. Loop through pages using the cursor.
Handle errors gracefully. The API occasionally returns 429 (rate limit) or 502 (server error). Implement exponential backoff.
Filter by language. Posts include a langs field — use it to focus on content in your target language.
Store data incrementally. Push results to your dataset as you go rather than collecting everything in memory.

Wrapping Up

Bluesky's open AT Protocol makes it one of the easiest social networks to scrape legally and ethically. Whether you use the Apify actor for a no-code solution or build your own scraper with the public API, you can extract valuable social data in minutes.

If you found this useful, give the actor a star on Apify — it helps others discover it.

Have questions about scraping Bluesky or the AT Protocol? Drop a comment below.

Recommended Tools for Web Scraping

If you're building scrapers at scale, these tools can save you hours of dealing with proxies, CAPTCHAs, and rate limits:

ScraperAPI — Handles proxy rotation, browser rendering, and CAPTCHAs automatically. Great if you don't want to manage your own proxy infrastructure. Comes with 5,000 free API credits to get started.
ScrapeOps — A proxy aggregator that routes your requests through 20+ proxy providers and picks the best one for each target site. Useful when you need reliability across different domains.

DEV Community