Bluesky just crossed 30 million users, and the best part? Its AT Protocol makes all public data freely accessible — no API keys, no rate limit headaches, no $42K/month Twitter API bills.
In this tutorial, you'll build a Python dashboard that scrapes Bluesky posts, analyzes engagement patterns, and outputs actionable insights. Whether you're doing brand monitoring, competitor research, or tracking trending topics — this gets you from zero to working analytics in about 30 minutes.
What We're Building
A Python script that:
- Scrapes Bluesky posts matching your search terms
- Processes the data into a structured format
- Generates engagement analytics (top posts, posting times, hashtag frequency)
- Outputs a clean CSV report you can open in any spreadsheet tool
Prerequisites
- Python 3.8+
- An Apify free account (no credit card needed)
- 10 minutes of patience
pip install apify-client pandas matplotlib
Step 1: Set Up the Bluesky Scraper
We'll use the Bluesky Posts Scraper on Apify, which handles all the AT Protocol complexity for you. It supports search queries, user profile scraping, and hashtag tracking.
First, grab your API token from Apify Console and store it:
# bluesky_dashboard.py
from apify_client import ApifyClient
import pandas as pd
from datetime import datetime
import json
# Initialize the Apify client
client = ApifyClient("your-apify-api-token")
def scrape_bluesky(search_term, max_posts=200):
"""Scrape Bluesky posts matching a search term."""
run_input = {
"searchTerms": [search_term],
"maxItems": max_posts,
"sort": "latest",
}
print(f"Scraping Bluesky for '{search_term}'...")
run = client.actor("cryptosignals/bluesky-scraper").call(run_input=run_input)
# Fetch results from the dataset
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
print(f"Found {len(items)} posts")
return items
Step 2: Process the Data
Raw JSON is useful, but structured data is where insights live. Let's transform the scraped posts into a clean DataFrame:
def process_posts(items):
"""Convert raw scraped data into a structured DataFrame."""
records = []
for item in items:
records.append({
"author": item.get("author", {}).get("handle", "unknown"),
"text": item.get("text", ""),
"likes": item.get("likeCount", 0),
"reposts": item.get("repostCount", 0),
"replies": item.get("replyCount", 0),
"created_at": item.get("createdAt", ""),
"url": item.get("url", ""),
})
df = pd.DataFrame(records)
# Parse timestamps
df["created_at"] = pd.to_datetime(df["created_at"], errors="coerce")
df["hour"] = df["created_at"].dt.hour
df["day_of_week"] = df["created_at"].dt.day_name()
# Calculate engagement score
df["engagement"] = df["likes"] + (df["reposts"] * 2) + (df["replies"] * 3)
return df
The engagement formula weights replies highest (they indicate conversation), then reposts (reach), then likes (passive appreciation). Adjust these weights based on what matters for your use case.
Step 3: Generate Analytics
Now the interesting part — extracting patterns:
def analyze_engagement(df):
"""Generate engagement analytics from processed posts."""
print("\n BLUESKY ANALYTICS REPORT")
print("=" * 50)
# Top performing posts
print("\n Top 5 Posts by Engagement:")
top_posts = df.nlargest(5, "engagement")[["author", "text", "engagement", "url"]]
for _, row in top_posts.iterrows():
preview = row["text"][:80] + "..." if len(row["text"]) > 80 else row["text"]
print(f" [{row['engagement']}] @{row['author']}: {preview}")
# Best posting times
print("\n Average Engagement by Hour:")
hourly = df.groupby("hour")["engagement"].mean().sort_values(ascending=False)
for hour, eng in hourly.head(5).items():
print(f" {hour:02d}:00 - avg engagement: {eng:.1f}")
# Most active authors
print("\n Most Active Authors:")
authors = df["author"].value_counts().head(10)
for author, count in authors.items():
avg_eng = df[df["author"] == author]["engagement"].mean()
print(f" @{author}: {count} posts (avg engagement: {avg_eng:.1f})")
# Day of week analysis
print("\n Engagement by Day:")
daily = df.groupby("day_of_week")["engagement"].mean()
day_order = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
for day in day_order:
if day in daily.index:
print(f" {day}: {daily[day]:.1f}")
return {
"total_posts": len(df),
"avg_engagement": df["engagement"].mean(),
"peak_hour": hourly.index[0] if len(hourly) > 0 else None,
"top_author": authors.index[0] if len(authors) > 0 else None,
}
def export_csv(df, filename="bluesky_report.csv"):
"""Export processed data to CSV."""
df.to_csv(filename, index=False)
print(f"\n Full data exported to {filename}")
Step 4: Visualization (Optional)
If you want a quick visual summary:
def plot_engagement(df):
"""Create a simple engagement visualization."""
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Engagement by hour
hourly = df.groupby("hour")["engagement"].mean()
axes[0].bar(hourly.index, hourly.values, color="#0085ff")
axes[0].set_title("Avg Engagement by Hour (UTC)")
axes[0].set_xlabel("Hour")
axes[0].set_ylabel("Engagement Score")
# Post volume by day
daily_count = df["day_of_week"].value_counts()
day_order = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
daily_sorted = daily_count.reindex(day_order).fillna(0)
axes[1].bar(range(7), daily_sorted.values, color="#0085ff")
axes[1].set_xticks(range(7))
axes[1].set_xticklabels([d[:3] for d in day_order])
axes[1].set_title("Post Volume by Day")
plt.tight_layout()
plt.savefig("bluesky_dashboard.png", dpi=150)
print("Chart saved to bluesky_dashboard.png")
Step 5: Put It All Together
if __name__ == "__main__":
# Change this to whatever you want to track
SEARCH_TERM = "python programming"
# Scrape
raw_posts = scrape_bluesky(SEARCH_TERM, max_posts=300)
# Process
df = process_posts(raw_posts)
# Analyze
summary = analyze_engagement(df)
# Export
export_csv(df, f"bluesky_{SEARCH_TERM.replace(' ', '_')}.csv")
# Visualize (comment out if you don't need charts)
plot_engagement(df)
Run it:
python bluesky_dashboard.py
You'll get a terminal report, a CSV file with all the data, and a PNG chart — all in one go.
Self-Hosting Alternative
If you're a hobbyist or want to avoid any API costs, I also maintain a free self-hosted version of the Bluesky scraper API at:
https://frog03-20494.wykr.es/api/v1/
It's running on a small VPS, so it has lower rate limits, but it works great for personal projects and experimentation. You can swap the Apify calls for simple HTTP requests:
import requests
def scrape_bluesky_selfhosted(search_term, max_posts=50):
resp = requests.get(
"https://frog03-20494.wykr.es/api/v1/search",
params={"q": search_term, "limit": max_posts}
)
return resp.json().get("posts", [])
Scaling Up: Handling Proxies
If you're doing high-volume research across multiple search terms or tracking hundreds of accounts, you may start hitting rate limits from your IP. A rotating proxy service like ScraperAPI can help — it handles proxy rotation, CAPTCHAs, and retries so you can focus on the data. This matters more when you're scraping at scale or combining Bluesky data with other platforms.
Real-World Use Cases
Brand monitoring: Track mentions of your company, product, or competitors. The engagement scores tell you which conversations are getting traction.
Content research: Find what topics get the most engagement in your niche before you write your next post.
Academic research: Bluesky has become popular with researchers and journalists. This pipeline gives you structured data for discourse analysis.
Trend detection: Run the scraper on a schedule (Apify has built-in cron) and compare engagement patterns over time to spot emerging trends.
What's Next
From here, you could:
- Add sentiment analysis using TextBlob or a local LLM
- Set up scheduled runs with Apify's cron feature
- Build a Streamlit dashboard for real-time monitoring
- Pipe the data into a database for long-term trend tracking
The AT Protocol's openness makes Bluesky one of the easiest social platforms to work with programmatically. No OAuth dance, no approval process, no surprise API changes — just data.
The complete code is about 100 lines of Python. Fork it, modify it, and let me know what you build with it.
Found this useful? Follow me for more practical Python tutorials on data collection and analysis.
Top comments (0)