The X API Pricing Problem
Twitter's API (now X) has become prohibitively expensive for most developers. The Basic tier costs $100/month for just 10,000 tweets. The Pro tier runs $5,000/month. For indie developers and researchers, these prices are a non-starter.
But social media data is still incredibly valuable — for sentiment analysis, trend detection, lead generation, and competitive intelligence. Let's explore the practical alternatives.
Option 1: Direct Web Scraping
X's web interface loads data via internal GraphQL endpoints. You can intercept these:
import requests
from urllib.parse import quote
def search_tweets(query, count=20):
"""Search tweets using X's internal search endpoint."""
encoded = quote(f'https://x.com/search?q={query}&f=live')
api_url = f"https://api.scraperapi.com?api_key=YOUR_KEY&url={encoded}&render=true"
resp = requests.get(api_url)
return parse_tweet_html(resp.text)
Option 2: Nitter Instances
Nitter is an open-source alternative Twitter frontend that does not require authentication:
import requests
from bs4 import BeautifulSoup
def scrape_nitter(username, instance="nitter.net"):
"""Scrape tweets from a Nitter instance."""
url = f"https://{instance}/{username}"
resp = requests.get(url)
soup = BeautifulSoup(resp.text, "html.parser")
tweets = []
for tweet in soup.select(".timeline-item"):
content = tweet.select_one(".tweet-content")
date = tweet.select_one(".tweet-date a")
if content:
tweets.append({
"text": content.get_text(strip=True),
"date": date.get("title") if date else None,
})
return tweets
tweets = scrape_nitter("elonmusk")
for t in tweets[:5]:
print(f"{t['date']}: {t['text'][:100]}")
Option 3: Browser Automation with Playwright
For the most reliable results, automate a real browser:
import asyncio
from playwright.async_api import async_playwright
async def scrape_x_profile(username, max_tweets=50):
"""Scrape tweets using headless browser automation."""
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
viewport={"width": 1280, "height": 720},
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
)
page = await context.new_page()
tweets = []
async def handle_response(response):
if "UserTweets" in response.url:
try:
data = await response.json()
extract_tweets(data, tweets)
except Exception:
pass
page.on("response", handle_response)
await page.goto(f"https://x.com/{username}")
for _ in range(5):
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await asyncio.sleep(2)
await browser.close()
return tweets[:max_tweets]
def extract_tweets(data, tweets):
"""Recursively extract tweet objects from GraphQL response."""
if isinstance(data, dict):
if "full_text" in data:
tweets.append({
"text": data.get("full_text"),
"retweet_count": data.get("retweet_count", 0),
"favorite_count": data.get("favorite_count", 0),
})
for value in data.values():
extract_tweets(value, tweets)
elif isinstance(data, list):
for item in data:
extract_tweets(item, tweets)
Option 4: Alternative Social Data Sources
Don't limit yourself to X. Other platforms have more accessible data:
import requests
# Bluesky - fully open AT Protocol
def search_bluesky(query, limit=25):
resp = requests.get(
"https://public.api.bsky.app/xrpc/app.bsky.feed.searchPosts",
params={"q": query, "limit": limit}
)
return resp.json().get("posts", [])
# Reddit - free API with reasonable limits
def search_reddit(query, subreddit="all"):
resp = requests.get(
f"https://www.reddit.com/r/{subreddit}/search.json",
params={"q": query, "limit": 25, "sort": "new"},
headers={"User-Agent": "DataCollector/1.0"}
)
return resp.json()["data"]["children"]
Handling Anti-Bot Protection
X invests heavily in bot detection. To scrape reliably:
- Use residential proxies — ThorData provides rotating residential IPs
- Rotate user agents and browser fingerprints
- Respect rate limits — 1-2 requests per second maximum
- Use a scraping API — ScraperAPI handles all anti-bot measures automatically
Monitoring Your Scraping Jobs
Track success rates and failures with ScrapeOps. It gives you dashboards showing which endpoints are failing and why.
Legal Considerations
- Scraping public data is generally legal (hiQ v. LinkedIn precedent)
- Do not scrape private/protected content
- Respect robots.txt guidelines
- Do not overwhelm servers with requests
- Check each platform's Terms of Service
Conclusion
The X API paywall pushed developers toward creative alternatives. Whether you use Nitter instances, browser automation, or pivot to more open platforms like Bluesky, social data remains accessible to those willing to build the right tools.
Top comments (0)