agenthustler

Posted on Mar 25 • Edited on Apr 19

How to Scrape Twitter/X in 2026: Public Data, Rate Limits, and What Still Works

#python #tutorial #webdev #datascience

Twitter/X scraping in 2026 is a minefield. After Elon Musk's aggressive API changes, rate limit crackdowns, and multiple lawsuits against scrapers, most of the old methods are dead. But public data extraction still works — if you know the current landscape.

This guide covers what actually works right now, what got killed, and how to scrape Twitter/X data without getting your IP banned or your account suspended.

Skip the Setup — Use Our Ready-Made Scraper

Dealing with Twitter's anti-bot detection, proxy rotation, and constantly changing HTML is exhausting. Our Twitter/X Scraper on Apify handles all of it: tweets, profiles, search results, and hashtags — with structured JSON output and 23+ active users.

Try it free →

The Current State of Twitter/X Data Access

Let's be clear about what changed:

Official API: The free tier is nearly useless (1,500 tweets/month read limit). Basic tier ($200/mo) gives you 10K tweets. Pro tier ($5,000/mo) for serious access.
Aggressive bot detection: Twitter now uses advanced fingerprinting, behavioral analysis, and ML-based detection.
Legal threats: Twitter/X has sued multiple scraping companies. They actively monitor for scraping activity.
Login walls: Most content now requires authentication to view.

What's still public and legal to access:

Public profiles and their tweet history
Public tweet content (when accessible without login)
Publicly visible engagement metrics
Trending topics and hashtags

Method 1: The Official API (When It Makes Sense)

Despite the cost, the official API is still the most reliable method for certain use cases.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

When the API makes sense:

You need < 10K tweets/month (Basic tier at $200)
You need real-time data (streaming endpoints)
Compliance and legal safety matter (enterprise use)
You need guaranteed uptime and structured data

When it doesn't:

Budget-constrained projects
Historical data (API only goes back 7 days on Basic)
Large-scale data collection

Method 2: Managed Scraping Services

This is what I actually recommend for most people. Let someone else deal with the proxy rotation, CAPTCHA solving, and detection evasion.

ScraperAPI

ScraperAPI handles the hard parts — rotating proxies, browser rendering, and anti-bot bypass. You send a URL, get back the HTML.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Pros: No proxy management, automatic retries, scales easily
Cons: Cost per request, depends on their infrastructure

ScrapeOps

ScrapeOps offers a proxy aggregator and monitoring dashboard that's particularly useful for Twitter scraping. They route your requests through the best-performing proxy for each target.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

What makes ScrapeOps stand out is their proxy benchmarking — they test proxy providers against specific targets and route through whichever performs best. For Twitter specifically, this matters because detection methods change frequently.

Method 3: Browser Automation with Stealth

For maximum control, you can run a headless browser with anti-detection measures. This is the most flexible approach but requires the most maintenance.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Important caveats with browser automation:

Twitter aggressively detects headless browsers
You need residential proxies (datacenter IPs are instantly blocked)
Login is required for most content — and logging in with automation violates ToS
Sessions get invalidated frequently

Method 4: Alternative Data Sources

Sometimes the best way to get Twitter data isn't scraping Twitter directly.

Nitter Instances

Nitter is an open-source Twitter frontend. Some public instances still work:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Reality check: Nitter instances are unreliable in 2026. Many have shut down. Don't build a production system on them.

Google Cache / Archive.org

For historical tweets, search engines and web archives sometimes have cached versions:

site:twitter.com "your search term" on Google
Wayback Machine API for archived tweet pages

Academic Access

Twitter's Academic Research API still exists for qualified researchers. If you're affiliated with a university, this gives you much broader access than the commercial API.

Rate Limits and How to Handle Them

Regardless of your method, you need to respect rate limits. Here's a reusable rate limiter:

import time
import random
from collections import deque
from functools import wraps

class RateLimiter:
    def __init__(
        self,
        max_requests: int,
        time_window: int,
        jitter: float = 0.5
    ):
        self.max_requests = max_requests
        self.time_window = time_window  # seconds
        self.jitter = jitter
        self.requests = deque()

    def wait_if_needed(self):
        now = time.time()

        # Remove old requests outside the window
        while (
            self.requests
            and self.requests[0] < now - self.time_window
        ):
            self.requests.popleft()

        if len(self.requests) >= self.max_requests:
            sleep_time = (
                self.requests[0]
                + self.time_window
                - now
                + random.uniform(0, self.jitter)
            )
            print(f"Rate limit — sleeping {sleep_time:.1f}s")
            time.sleep(sleep_time)

        self.requests.append(time.time())

# Usage
limiter = RateLimiter(
    max_requests=30, time_window=60, jitter=2.0
)

urls_to_scrape = ["https://x.com/user1", "https://x.com/user2"]

for url in urls_to_scrape:
    limiter.wait_if_needed()
    # ... make your request here

What Doesn't Work Anymore

Let's save you time. These methods are dead or dying:

snscrape — The most popular Twitter scraping library. Broken since mid-2023 and abandoned. Don't use it.
Tweepy free tier — Rate limits make it impractical for any real data collection.
Simple HTTP requests without rendering — Twitter is a fully JavaScript-rendered SPA. Raw HTTP gets you nothing useful.
Free proxy lists — Every free proxy list is full of dead or compromised IPs. Use paid services.
Guest tokens — Twitter killed unauthenticated API access. Guest tokens no longer work for most endpoints.

Ethical and Legal Considerations

I want to be straightforward about this:

Public data is generally legal to access in most jurisdictions (see hiQ v. LinkedIn)
Terms of Service violations are not criminal, but can lead to account bans and civil liability
The CFAA (in the US) is a gray area — the Van Buren decision narrowed its scope, but scraping behind auth could still be risky
GDPR (in the EU) applies to personal data regardless of how you collected it
Twitter's specific stance: They've sued companies for scraping and won injunctions. Individual hobbyists are unlikely targets, but commercial operations should be careful.

My recommendation: Use the official API when you can afford it. Use managed services like ScraperAPI or ScrapeOps when you can't. Only go the browser automation route if you truly need it and understand the risks.

Production-Ready Alternative: Twitter/X Scraper on Apify

If you'd rather skip the complexity entirely, our Twitter/X Scraper on Apify handles the entire pipeline:

Tweets, profiles, search results, and hashtags — all data types in one tool
Anti-detection built in — proxy rotation, fingerprint evasion, automatic retries
Structured JSON output — clean data ready for analysis or database import
Scheduling — run daily/weekly collections automatically
23+ active users trusting it for production data pipelines
Free tier to get started

No need to manage proxies, fight rate limits, or update your scraper every time Twitter changes their frontend.

Try the Twitter/X Scraper free →

Recommended Stack for Twitter/X Scraping in 2026

Component	Recommendation
Primary data source	Official API (if budget allows)
Proxy service	ScraperAPI or ScrapeOps for managed proxies; ThorData for raw residential IPs
Browser automation	Playwright with stealth plugins
Rate limiting	Custom rate limiter (code above)
Data storage	PostgreSQL or MongoDB
Monitoring	Track success rates per method
Fallback	Always have 2+ methods ready

Quick Start: Minimal Working Example

If you just want to get started quickly, here's the simplest path:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

The Twitter/X scraping landscape will keep changing. The key is building flexible systems that can swap between data sources when one breaks. Don't over-invest in any single method — it will break eventually.

Have a method that still works? Found something I missed? Share it in the comments — the community benefits when we share what's actually working right now.

DEV Community