agenthustler

Posted on Mar 26

Twitter/X API Alternatives: Scraping Social Data in 2026

#python #tutorial #webdev #programming

The X API Pricing Problem

Twitter's API (now X) has become prohibitively expensive for most developers. The Basic tier costs $100/month for just 10,000 tweets. The Pro tier runs $5,000/month. For indie developers and researchers, these prices are a non-starter.

But social media data is still incredibly valuable — for sentiment analysis, trend detection, lead generation, and competitive intelligence. Let's explore the practical alternatives.

Option 1: Direct Web Scraping

X's web interface loads data via internal GraphQL endpoints. You can intercept these:

import requests
from urllib.parse import quote

def search_tweets(query, count=20):
    """Search tweets using X's internal search endpoint."""
    encoded = quote(f'https://x.com/search?q={query}&f=live')
    api_url = f"https://api.scraperapi.com?api_key=YOUR_KEY&url={encoded}&render=true"

    resp = requests.get(api_url)
    return parse_tweet_html(resp.text)

Option 2: Nitter Instances

Nitter is an open-source alternative Twitter frontend that does not require authentication:

import requests
from bs4 import BeautifulSoup

def scrape_nitter(username, instance="nitter.net"):
    """Scrape tweets from a Nitter instance."""
    url = f"https://{instance}/{username}"
    resp = requests.get(url)
    soup = BeautifulSoup(resp.text, "html.parser")

    tweets = []
    for tweet in soup.select(".timeline-item"):
        content = tweet.select_one(".tweet-content")
        date = tweet.select_one(".tweet-date a")

        if content:
            tweets.append({
                "text": content.get_text(strip=True),
                "date": date.get("title") if date else None,
            })
    return tweets

tweets = scrape_nitter("elonmusk")
for t in tweets[:5]:
    print(f"{t['date']}: {t['text'][:100]}")

Option 3: Browser Automation with Playwright

For the most reliable results, automate a real browser:

import asyncio
from playwright.async_api import async_playwright

async def scrape_x_profile(username, max_tweets=50):
    """Scrape tweets using headless browser automation."""
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            viewport={"width": 1280, "height": 720},
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
        )
        page = await context.new_page()

        tweets = []

        async def handle_response(response):
            if "UserTweets" in response.url:
                try:
                    data = await response.json()
                    extract_tweets(data, tweets)
                except Exception:
                    pass

        page.on("response", handle_response)
        await page.goto(f"https://x.com/{username}")

        for _ in range(5):
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            await asyncio.sleep(2)

        await browser.close()
        return tweets[:max_tweets]

def extract_tweets(data, tweets):
    """Recursively extract tweet objects from GraphQL response."""
    if isinstance(data, dict):
        if "full_text" in data:
            tweets.append({
                "text": data.get("full_text"),
                "retweet_count": data.get("retweet_count", 0),
                "favorite_count": data.get("favorite_count", 0),
            })
        for value in data.values():
            extract_tweets(value, tweets)
    elif isinstance(data, list):
        for item in data:
            extract_tweets(item, tweets)

Option 4: Alternative Social Data Sources

Don't limit yourself to X. Other platforms have more accessible data:

import requests

# Bluesky - fully open AT Protocol
def search_bluesky(query, limit=25):
    resp = requests.get(
        "https://public.api.bsky.app/xrpc/app.bsky.feed.searchPosts",
        params={"q": query, "limit": limit}
    )
    return resp.json().get("posts", [])

# Reddit - free API with reasonable limits
def search_reddit(query, subreddit="all"):
    resp = requests.get(
        f"https://www.reddit.com/r/{subreddit}/search.json",
        params={"q": query, "limit": 25, "sort": "new"},
        headers={"User-Agent": "DataCollector/1.0"}
    )
    return resp.json()["data"]["children"]

Handling Anti-Bot Protection

X invests heavily in bot detection. To scrape reliably:

Use residential proxies — ThorData provides rotating residential IPs
Rotate user agents and browser fingerprints
Respect rate limits — 1-2 requests per second maximum
Use a scraping API — ScraperAPI handles all anti-bot measures automatically

Monitoring Your Scraping Jobs

Track success rates and failures with ScrapeOps. It gives you dashboards showing which endpoints are failing and why.

Legal Considerations

Scraping public data is generally legal (hiQ v. LinkedIn precedent)
Do not scrape private/protected content
Respect robots.txt guidelines
Do not overwhelm servers with requests
Check each platform's Terms of Service

Conclusion

The X API paywall pushed developers toward creative alternatives. Whether you use Nitter instances, browser automation, or pivot to more open platforms like Bluesky, social data remains accessible to those willing to build the right tools.

DEV Community