How to Scrape Twitter/X Profile Data Without the API in 2026

#webscraping #python #javascript #twitter

Scraping Twitter/X public profile data has gotten harder since Elon Musk's API pricing changes ($42,000/month for Enterprise access). Here's what still works in 2026.

What's publicly accessible without the API

Twitter/X still renders public profile data without authentication:

Display name, handle, bio
Follower/following counts
Tweet count, join date
Profile and banner images
Location (if set)

What requires authentication: tweet content beyond the first ~20, DMs, full timelines.

Method 1: Twitter's syndication API (simplest)

Twitter has an unofficial data endpoint used by embedded follow buttons. It returns basic profile data with no auth required:

import requests

def get_twitter_profile(username: str) -> dict:
    url = "https://cdn.syndication.twimg.com/widgets/followbutton/info.json"
    params = {"screen_names": username}
    headers = {
        "User-Agent": "Mozilla/5.0",
        "Referer": "https://platform.twitter.com/"
    }

    response = requests.get(url, params=params, headers=headers)
    if response.status_code == 200:
        data = response.json()
        return data[0] if data else {}
    return {}

profile = get_twitter_profile("vhubsystems")
# Returns: name, screen_name, followers_count, following_count, description
print(profile)

This works for basic profile data and doesn't require auth or proxies.

Method 2: Playwright with stealth

For more data (pinned tweet, join date, location), you need a browser:

from playwright.async_api import async_playwright
import asyncio

async def scrape_twitter_profile(username: str) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            args=["--disable-blink-features=AutomationControlled"]
        )

        context = await browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/122.0.0.0 Safari/537.36",
            viewport={"width": 1280, "height": 800}
        )
        page = await context.new_page()

        # Mask automation
        await page.add_init_script(
            "Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"
        )

        await page.goto(f"https://twitter.com/{username}", wait_until="networkidle")

        data = await page.evaluate("""
            () => ({
                name: document.querySelector('[data-testid="UserName"] span')?.innerText,
                bio: document.querySelector('[data-testid="UserDescription"]')?.innerText,
                followers: document.querySelector('a[href$="/followers"] span span')?.innerText,
                following: document.querySelector('a[href$="/following"] span span')?.innerText
            })
        """)

        await browser.close()
        return data

profile = asyncio.run(scrape_twitter_profile("elonmusk"))
print(profile)

Method 3: Nitter mirrors

Nitter instances re-render Twitter content without JavaScript:

import requests
from bs4 import BeautifulSoup

NITTER_INSTANCES = [
    "https://nitter.poast.org",
    "https://nitter.privacydev.net",
]

def scrape_via_nitter(username: str) -> dict:
    for instance in NITTER_INSTANCES:
        try:
            r = requests.get(f"{instance}/{username}", timeout=10)
            if r.status_code == 200:
                soup = BeautifulSoup(r.text, "html.parser")
                return {
                    "name": (soup.select_one(".profile-card-fullname") or "").text,
                    "bio": (soup.select_one(".profile-bio") or "").text,
                    "followers": (soup.select_one(".followers") or "").text,
                }
        except Exception:
            continue
    return {}

Nitter instances go offline frequently — don't rely on them for production.

Method 4: Pre-built actor (recommended for scale)

The Twitter Profile Scraper on Apify handles stealth browsing, proxy rotation, and rate limiting automatically. Input usernames or profile URLs, get structured JSON.

Sample output:

{
  "username": "example",
  "name": "Example User",
  "bio": "Building things",
  "followers": 15200,
  "following": 420,
  "tweets": 3847,
  "joinDate": "March 2018",
  "location": "San Francisco, CA",
  "verified": false
}

231+ production runs. Pay-per-result pricing.

Rate limits and proxies

Twitter blocks at scale. For 100+ profiles:

Residential proxies required (datacenter IPs blocked within minutes)
5-10 second delays between requests
Session warm-up (visit homepage first)
Rotating user agents

For 1,000+: use the managed actor instead.

Use cases

Lead qualification: verify prospect Twitter presence before outreach
Influencer research: find accounts in your niche by follower range + bio keywords
Competitor monitoring: track follower growth over time
Market research: analyze bio keywords across a cohort