DEV Community

agenthustler
agenthustler

Posted on

How to Scrape Twitter/X in 2026: Tweets, Profiles, Trends, and Follower Data

Ever since Elon Musk killed off free API access in early 2023, scraping Twitter (now X) has become one of the most frustrating challenges in web scraping. The platform that was once the easiest social network to pull data from is now one of the hardest.

In this guide, I'll walk through every practical method available in 2026 — what works, what doesn't, what it costs, and where the traps are.

The Current State of Twitter/X Data Access

Let's be blunt: there is no free, reliable, officially-supported way to get Twitter data at scale in 2026.

Here's what happened:

  • February 2023: Twitter shut down free API v1.1 access
  • March 2023: API v2 moved to paid tiers ($100/month minimum for basic access)
  • Late 2023-2024: Aggressive anti-scraping measures rolled out (rate limiting, bot detection, IP bans)
  • 2025-2026: Continued tightening — even logged-in browser sessions get throttled after moderate usage

If anyone tells you "just use this free Python library and you're good," they're either outdated or lying. Every method has trade-offs.

Method 1: Twitter API v2 (Official, Paid)

The official API is the most reliable option, but it's expensive.

Pricing Tiers (as of 2026)

Tier Cost Tweet reads/month Write access
Free $0 1 app, limited Write only (1,500 tweets/month)
Basic $100/mo 10,000 reads Yes
Pro $5,000/mo 1,000,000 reads Yes
Enterprise Custom Unlimited Full firehose

Basic Python Example

import requests

BEARER_TOKEN = "your_bearer_token"

def search_recent_tweets(query, max_results=10):
    url = "https://api.twitter.com/2/tweets/search/recent"
    headers = {"Authorization": f"Bearer {BEARER_TOKEN}"}
    params = {
        "query": query,
        "max_results": max_results,
        "tweet.fields": "created_at,public_metrics,author_id",
        "expansions": "author_id",
        "user.fields": "username,name,public_metrics"
    }
    response = requests.get(url, headers=headers, params=params)
    return response.json()

results = search_recent_tweets("python web scraping", max_results=10)
for tweet in results.get("data", []):
    print(f"{tweet['text'][:80]}...")
    print(f"  Likes: {tweet['public_metrics']['like_count']}")
Enter fullscreen mode Exit fullscreen mode

Pros: Reliable, structured data, legal

Cons: $100/month minimum for reads, strict rate limits, limited historical data on Basic tier

When to Use the Official API

  • You need real-time data (streaming endpoints)
  • You're building a product that depends on Twitter data
  • Compliance matters (academic research, business use)
  • You only need a small volume (< 10K tweets/month)

Method 2: Alternative Frontends (Nitter-Style)

Nitter was an open-source alternative Twitter frontend that let you view tweets without JavaScript or a Twitter account. At its peak, dozens of public instances existed.

Current Status (2026)

Most public Nitter instances are dead or unreliable. Twitter aggressively blocks the guest API endpoints that Nitter depended on. Some self-hosted instances still work intermittently, but:

  • They break every few weeks when Twitter changes something
  • Rate limits are severe
  • No guarantee of data completeness
  • Running your own instance requires constant maintenance
# Example: attempting to use a Nitter instance (unreliable)
import requests
from bs4 import BeautifulSoup

def try_nitter_scrape(username, instance="nitter.net"):
    """Warning: Most instances are down. This is for illustration."""
    url = f"https://{instance}/{username}"
    headers = {"User-Agent": "Mozilla/5.0"}
    try:
        resp = requests.get(url, headers=headers, timeout=10)
        if resp.status_code == 200:
            soup = BeautifulSoup(resp.text, "html.parser")
            tweets = soup.select(".timeline-item .tweet-content")
            return [t.get_text(strip=True) for t in tweets]
    except Exception as e:
        print(f"Instance {instance} failed: {e}")
    return []
Enter fullscreen mode Exit fullscreen mode

Verdict: Don't build anything that depends on Nitter in 2026. It's a cat-and-mouse game you'll lose.

Method 3: Browser Automation (Selenium/Playwright)

This approach uses a real browser to load Twitter, scroll through content, and extract data from the rendered page.

from playwright.sync_api import sync_playwright
import time
import json

def scrape_twitter_profile(username, max_tweets=20):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            viewport={"width": 1280, "height": 720},
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
        )
        page = context.new_page()

        # You'll likely need to log in first
        page.goto(f"https://x.com/{username}")
        time.sleep(3)  # Wait for dynamic content

        tweets = []
        scroll_count = 0
        while len(tweets) < max_tweets and scroll_count < 10:
            # Extract tweet elements
            elements = page.query_selector_all('[data-testid="tweetText"]')
            for el in elements:
                text = el.inner_text()
                if text not in tweets:
                    tweets.append(text)

            # Scroll down
            page.evaluate("window.scrollBy(0, 800)")
            time.sleep(2)
            scroll_count += 1

        browser.close()
        return tweets[:max_tweets]
Enter fullscreen mode Exit fullscreen mode

The Honest Truth About Browser Scraping Twitter

  1. You need to be logged in — Twitter shows almost nothing to logged-out visitors
  2. Account bans are common — Twitter detects automation patterns and suspends accounts
  3. It's slow — each page load takes seconds, scrolling takes more seconds
  4. Anti-bot detection — Twitter uses sophisticated fingerprinting
  5. Selectors break constantly — Twitter uses obfuscated class names that change regularly

Method 4: Proxy-Based Scraping Services

If you're serious about scraping Twitter at any scale, you'll need residential proxies to avoid IP bans.

ThorData offers residential proxy networks that rotate IPs automatically, which is essential when Twitter rate-limits per IP. For a more managed approach, ScraperAPI handles proxy rotation and browser rendering in one API call.

# Using a proxy service to avoid IP bans
import requests

SCRAPER_API_KEY = "your_key"

def scrape_with_proxy(url):
    proxy_url = f"http://api.scraperapi.com?api_key={SCRAPER_API_KEY}&url={url}&render=true"
    response = requests.get(proxy_url)
    return response.text

# Scrape a Twitter profile through proxy
html = scrape_with_proxy("https://x.com/elonmusk")
Enter fullscreen mode Exit fullscreen mode

Method 5: Pre-Built Scraping Actors

If you don't want to build and maintain your own scraper, platforms like Apify offer pre-built actors that handle the complexity for you. I maintain some data collection actors on Apify that handle the infrastructure side — proxy rotation, browser management, anti-detection, and data formatting.

The advantage here is you get structured JSON output without dealing with any of the browser automation headaches.

What Method Should You Choose?

Here's my honest recommendation:

Scenario Best Method Monthly Cost
Small volume, need compliance Twitter API v2 Basic $100
Academic research Twitter API v2 (Academic tier) Apply for access
One-off data collection Pre-built actors (Apify) Pay per use
Large-scale monitoring Twitter API Pro + proxies $5,000+
Quick prototype/experiment Browser automation + proxies $30-50 (proxy costs)

Legal Considerations

This is not legal advice, but you should know:

  • Twitter's Terms of Service prohibit scraping
  • The 2022 hiQ v. LinkedIn ruling established that scraping publicly accessible data is not a CFAA violation — but Twitter's data isn't fully "publicly accessible" anymore (login walls)
  • The EU's GDPR applies to personal data regardless of how you collected it
  • If you're using scraped data commercially, consult a lawyer

Key Takeaways

  1. There's no free lunch. Every method costs either money, time, or both.
  2. The official API is the safest bet if you can afford it and the rate limits work for your use case.
  3. Browser automation works but requires constant maintenance and proxy infrastructure.
  4. Pre-built solutions save massive amounts of development time.
  5. Budget realistically — between proxies like ThorData, API costs, and infrastructure, plan for $50-200/month minimum for any serious data collection.

The days of import tweepy; api.search() for free are gone. But with the right approach and realistic expectations, you can still get the Twitter data you need.


Have questions about scraping Twitter/X? Drop a comment below — I'll share what's worked (and what hasn't) from my experience.

Top comments (0)