agenthustler

Posted on Mar 20 • Edited on Apr 19

Build a Bluesky Analytics Dashboard with Python (Step-by-Step)

#python #webscraping #bluesky #tutorial

Bluesky just crossed 30 million users, and the best part? Its AT Protocol makes all public data freely accessible — no API keys, no rate limit headaches, no $42K/month Twitter API bills.

In this tutorial, you'll build a Python dashboard that scrapes Bluesky posts, analyzes engagement patterns, and outputs actionable insights. Whether you're doing brand monitoring, competitor research, or tracking trending topics — this gets you from zero to working analytics in about 30 minutes.

What We're Building

A Python script that:

Scrapes Bluesky posts matching your search terms
Processes the data into a structured format
Generates engagement analytics (top posts, posting times, hashtag frequency)
Outputs a clean CSV report you can open in any spreadsheet tool

Prerequisites

Python 3.8+
An Apify free account (no credit card needed)
10 minutes of patience

pip install apify-client pandas matplotlib

Step 1: Set Up the Bluesky Scraper

We'll use the Bluesky Posts Scraper on Apify, which handles all the AT Protocol complexity for you. It supports search queries, user profile scraping, and hashtag tracking.

First, grab your API token from Apify Console and store it:

# bluesky_dashboard.py
from apify_client import ApifyClient
import pandas as pd
from datetime import datetime
import json

# Initialize the Apify client
client = ApifyClient("your-apify-api-token")

def scrape_bluesky(search_term, max_posts=200):
    """Scrape Bluesky posts matching a search term."""

    run_input = {
        "searchTerms": [search_term],
        "maxItems": max_posts,
        "sort": "latest",
    }

    print(f"Scraping Bluesky for '{search_term}'...")
    run = client.actor("cryptosignals/bluesky-scraper").call(run_input=run_input)

    # Fetch results from the dataset
    items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
    print(f"Found {len(items)} posts")

    return items

Step 2: Process the Data

Raw JSON is useful, but structured data is where insights live. Let's transform the scraped posts into a clean DataFrame:

def process_posts(items):
    """Convert raw scraped data into a structured DataFrame."""

    records = []
    for item in items:
        records.append({
            "author": item.get("author", {}).get("handle", "unknown"),
            "text": item.get("text", ""),
            "likes": item.get("likeCount", 0),
            "reposts": item.get("repostCount", 0),
            "replies": item.get("replyCount", 0),
            "created_at": item.get("createdAt", ""),
            "url": item.get("url", ""),
        })

    df = pd.DataFrame(records)

    # Parse timestamps
    df["created_at"] = pd.to_datetime(df["created_at"], errors="coerce")
    df["hour"] = df["created_at"].dt.hour
    df["day_of_week"] = df["created_at"].dt.day_name()

    # Calculate engagement score
    df["engagement"] = df["likes"] + (df["reposts"] * 2) + (df["replies"] * 3)

    return df

The engagement formula weights replies highest (they indicate conversation), then reposts (reach), then likes (passive appreciation). Adjust these weights based on what matters for your use case.

Step 3: Generate Analytics

Now the interesting part — extracting patterns:

def analyze_engagement(df):
    """Generate engagement analytics from processed posts."""

    print("\n BLUESKY ANALYTICS REPORT")
    print("=" * 50)

    # Top performing posts
    print("\n Top 5 Posts by Engagement:")
    top_posts = df.nlargest(5, "engagement")[["author", "text", "engagement", "url"]]
    for _, row in top_posts.iterrows():
        preview = row["text"][:80] + "..." if len(row["text"]) > 80 else row["text"]
        print(f"  [{row['engagement']}] @{row['author']}: {preview}")

    # Best posting times
    print("\n Average Engagement by Hour:")
    hourly = df.groupby("hour")["engagement"].mean().sort_values(ascending=False)
    for hour, eng in hourly.head(5).items():
        print(f"  {hour:02d}:00 - avg engagement: {eng:.1f}")

    # Most active authors
    print("\n Most Active Authors:")
    authors = df["author"].value_counts().head(10)
    for author, count in authors.items():
        avg_eng = df[df["author"] == author]["engagement"].mean()
        print(f"  @{author}: {count} posts (avg engagement: {avg_eng:.1f})")

    # Day of week analysis
    print("\n Engagement by Day:")
    daily = df.groupby("day_of_week")["engagement"].mean()
    day_order = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
    for day in day_order:
        if day in daily.index:
            print(f"  {day}: {daily[day]:.1f}")

    return {
        "total_posts": len(df),
        "avg_engagement": df["engagement"].mean(),
        "peak_hour": hourly.index[0] if len(hourly) > 0 else None,
        "top_author": authors.index[0] if len(authors) > 0 else None,
    }


def export_csv(df, filename="bluesky_report.csv"):
    """Export processed data to CSV."""
    df.to_csv(filename, index=False)
    print(f"\n Full data exported to {filename}")

Step 4: Visualization (Optional)

If you want a quick visual summary:

def plot_engagement(df):
    """Create a simple engagement visualization."""
    import matplotlib.pyplot as plt

    fig, axes = plt.subplots(1, 2, figsize=(14, 5))

    # Engagement by hour
    hourly = df.groupby("hour")["engagement"].mean()
    axes[0].bar(hourly.index, hourly.values, color="#0085ff")
    axes[0].set_title("Avg Engagement by Hour (UTC)")
    axes[0].set_xlabel("Hour")
    axes[0].set_ylabel("Engagement Score")

    # Post volume by day
    daily_count = df["day_of_week"].value_counts()
    day_order = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
    daily_sorted = daily_count.reindex(day_order).fillna(0)
    axes[1].bar(range(7), daily_sorted.values, color="#0085ff")
    axes[1].set_xticks(range(7))
    axes[1].set_xticklabels([d[:3] for d in day_order])
    axes[1].set_title("Post Volume by Day")

    plt.tight_layout()
    plt.savefig("bluesky_dashboard.png", dpi=150)
    print("Chart saved to bluesky_dashboard.png")

Step 5: Put It All Together

if __name__ == "__main__":
    # Change this to whatever you want to track
    SEARCH_TERM = "python programming"

    # Scrape
    raw_posts = scrape_bluesky(SEARCH_TERM, max_posts=300)

    # Process
    df = process_posts(raw_posts)

    # Analyze
    summary = analyze_engagement(df)

    # Export
    export_csv(df, f"bluesky_{SEARCH_TERM.replace(' ', '_')}.csv")

    # Visualize (comment out if you don't need charts)
    plot_engagement(df)

Run it:

python bluesky_dashboard.py

You'll get a terminal report, a CSV file with all the data, and a PNG chart — all in one go.

Self-Hosting Alternative

If you're a hobbyist or want to avoid any API costs, I also maintain a free self-hosted version of the Bluesky scraper API at:

https://web-data-labs.com/api/v1/

It's running on a small VPS, so it has lower rate limits, but it works great for personal projects and experimentation. You can swap the Apify calls for simple HTTP requests:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Scaling Up: Handling Proxies

If you're doing high-volume research across multiple search terms or tracking hundreds of accounts, you may start hitting rate limits from your IP. A rotating proxy service like ScraperAPI can help — it handles proxy rotation, CAPTCHAs, and retries so you can focus on the data. This matters more when you're scraping at scale or combining Bluesky data with other platforms.

Real-World Use Cases

Brand monitoring: Track mentions of your company, product, or competitors. The engagement scores tell you which conversations are getting traction.

Content research: Find what topics get the most engagement in your niche before you write your next post.

Academic research: Bluesky has become popular with researchers and journalists. This pipeline gives you structured data for discourse analysis.

Trend detection: Run the scraper on a schedule (Apify has built-in cron) and compare engagement patterns over time to spot emerging trends.

What's Next

From here, you could:

Add sentiment analysis using TextBlob or a local LLM
Set up scheduled runs with Apify's cron feature
Build a Streamlit dashboard for real-time monitoring
Pipe the data into a database for long-term trend tracking

The AT Protocol's openness makes Bluesky one of the easiest social platforms to work with programmatically. No OAuth dance, no approval process, no surprise API changes — just data.

The complete code is about 100 lines of Python. Fork it, modify it, and let me know what you build with it.

Found this useful? Follow me for more practical Python tutorials on data collection and analysis.

DEV Community