How to Scrape Threads Profiles and Posts Without the API in 2026

#webdev #python #socialmedia #tutorial

How to Scrape Threads Profiles and Posts Without the API in 2026

Meta's Threads platform has grown to 350M+ users but offers no public API for data extraction. If you want follower counts, post engagement, or profile data at scale — you need to scrape it.

Here's what works in 2026.

Why Scraping Threads Is Different From Instagram

Threads is built on Meta's infrastructure, which means:

No public API (ActivityPub federation exists but doesn't expose profile data)
GraphQL API under the hood (similar to Instagram, same WAF)
Session-based rate limits — requests without valid session get throttled fast
JavaScript-rendered content — most data loads after initial HTML

The good news: Threads loads faster than Instagram and has less aggressive bot detection for profile data.

What You Can Extract

Field	Available
Username, display name, bio	✅
Follower count	✅
Following count	✅
Verified status	✅
Post text content	✅
Post likes, replies, reposts	✅
Post timestamps	✅
Profile picture URL	✅
External links in bio	✅

Method 1: Direct API Approach (Python)

Threads uses a private GraphQL API. With the right session cookie, you can hit it directly:

import requests
import json

SESSION_COOKIE = "your_sessionid_cookie_here"

def get_threads_profile(username):
    headers = {
        "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15",
        "Accept": "application/json",
        "X-IG-App-ID": "238260118697367",
        "Cookie": f"sessionid={SESSION_COOKIE}",
    }
    url = f"https://www.threads.net/api/v1/users/web_profile_info/?username={username}"
    r = requests.get(url, headers=headers)
    if r.status_code != 200:
        return None
    user_data = r.json().get("data", {}).get("user", {})
    return {
        "username": user_data.get("username"),
        "full_name": user_data.get("full_name"),
        "biography": user_data.get("biography"),
        "follower_count": user_data.get("follower_count"),
        "following_count": user_data.get("following_count"),
        "is_verified": user_data.get("is_verified"),
    }

profile = get_threads_profile("zuck")
print(json.dumps(profile, indent=2))

Method 2: Playwright with Stealth Mode

For post content (requires JS execution):

from playwright.sync_api import sync_playwright
import time

def scrape_threads_posts(username, max_posts=20):
    posts = []
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            user_agent="Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15",
            viewport={"width": 390, "height": 844}
        )
        def handle_response(response):
            if "threads_web" in response.url and "graphql" in response.url:
                try:
                    data = response.json()
                    items = data.get("data", {}).get("mediaData", {}).get("threads", [])
                    for item in items:
                        for ti in item.get("thread_items", []):
                            post = ti.get("post", {})
                            if post:
                                posts.append({
                                    "text": post.get("caption", {}).get("text", ""),
                                    "likes": post.get("like_count", 0),
                                    "replies": post.get("reply_count", 0),
                                })
                except: pass
        page = context.new_page()
        page.on("response", handle_response)
        page.goto(f"https://www.threads.net/@{username}")
        time.sleep(3)
        for _ in range(3):
            page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            time.sleep(2)
        browser.close()
    return posts[:max_posts]

Method 3: Apify Actor (Easiest)

The Threads Profile Scraper handles session management and proxy rotation automatically.

import requests, time

run = requests.post(
    "https://api.apify.com/v2/acts/lanky_quantifier~threads-profile-scraper/runs",
    headers={"Authorization": "Bearer YOUR_TOKEN"},
    json={"usernames": ["zuck", "mosseri"], "maxPostsPerProfile": 50}
).json()["data"]

while True:
    status = requests.get(
        f"https://api.apify.com/v2/actor-runs/{run['id']}",
        headers={"Authorization": "Bearer YOUR_TOKEN"}
    ).json()["data"]["status"]
    if status in ("SUCCEEDED", "FAILED"): break
    time.sleep(5)

results = requests.get(
    f"https://api.apify.com/v2/actor-runs/{run['id']}/dataset/items",
    headers={"Authorization": "Bearer YOUR_TOKEN"}
).json()
for p in results:
    print(f"@{p['username']}: {p['follower_count']:,} followers")

Cost: ~$0.50 per 1,000 profiles scraped.

Rate Limits and Anti-Detection

Behavior	Risk Level
>100 requests/min without session	🔴 Block
Datacenter IPs	🟡 Medium
Residential IPs with session	🟢 Low
Mobile user-agent + session	🟢 Low

Use mobile user-agents (Threads is mobile-first) and residential proxies for sustained scraping.

Use Cases

Influencer research: Track follower growth before partnerships
Competitor monitoring: Watch brand accounts for content strategy changes
Trend detection: Find which topics drive highest engagement
Lead generation: Find professionals by keywords in their bio
Market research: Track public sentiment on product launches

The method depends on your scale. For one-time research, use the direct API approach. For ongoing monitoring of hundreds of profiles, use Apify's managed scraper to avoid session rotation headaches.

Save hours on scraping setup: The $29 Apify Scrapers Bundle includes 35+ production-ready actors — Google SERP, LinkedIn, Amazon, TikTok, contact info, and more. Pre-configured inputs, working on day one.

Get the Bundle ($29) →