DEV Community

Vhub Systems
Vhub Systems

Posted on

How to Scrape Threads Profiles and Posts Without the API in 2026

How to Scrape Threads Profiles and Posts Without the API in 2026

Meta's Threads platform has grown to 350M+ users but offers no public API for data extraction. If you want follower counts, post engagement, or profile data at scale — you need to scrape it.

Here's what works in 2026.

Why Scraping Threads Is Different From Instagram

Threads is built on Meta's infrastructure, which means:

  • No public API (ActivityPub federation exists but doesn't expose profile data)
  • GraphQL API under the hood (similar to Instagram, same WAF)
  • Session-based rate limits — requests without valid session get throttled fast
  • JavaScript-rendered content — most data loads after initial HTML

The good news: Threads loads faster than Instagram and has less aggressive bot detection for profile data.

What You Can Extract

Field Available
Username, display name, bio
Follower count
Following count
Verified status
Post text content
Post likes, replies, reposts
Post timestamps
Profile picture URL
External links in bio

Method 1: Direct API Approach (Python)

Threads uses a private GraphQL API. With the right session cookie, you can hit it directly:

import requests
import json

SESSION_COOKIE = "your_sessionid_cookie_here"

def get_threads_profile(username):
    headers = {
        "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15",
        "Accept": "application/json",
        "X-IG-App-ID": "238260118697367",
        "Cookie": f"sessionid={SESSION_COOKIE}",
    }
    url = f"https://www.threads.net/api/v1/users/web_profile_info/?username={username}"
    r = requests.get(url, headers=headers)
    if r.status_code != 200:
        return None
    user_data = r.json().get("data", {}).get("user", {})
    return {
        "username": user_data.get("username"),
        "full_name": user_data.get("full_name"),
        "biography": user_data.get("biography"),
        "follower_count": user_data.get("follower_count"),
        "following_count": user_data.get("following_count"),
        "is_verified": user_data.get("is_verified"),
    }

profile = get_threads_profile("zuck")
print(json.dumps(profile, indent=2))
Enter fullscreen mode Exit fullscreen mode

Method 2: Playwright with Stealth Mode

For post content (requires JS execution):

from playwright.sync_api import sync_playwright
import time

def scrape_threads_posts(username, max_posts=20):
    posts = []
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            user_agent="Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15",
            viewport={"width": 390, "height": 844}
        )
        def handle_response(response):
            if "threads_web" in response.url and "graphql" in response.url:
                try:
                    data = response.json()
                    items = data.get("data", {}).get("mediaData", {}).get("threads", [])
                    for item in items:
                        for ti in item.get("thread_items", []):
                            post = ti.get("post", {})
                            if post:
                                posts.append({
                                    "text": post.get("caption", {}).get("text", ""),
                                    "likes": post.get("like_count", 0),
                                    "replies": post.get("reply_count", 0),
                                })
                except: pass
        page = context.new_page()
        page.on("response", handle_response)
        page.goto(f"https://www.threads.net/@{username}")
        time.sleep(3)
        for _ in range(3):
            page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            time.sleep(2)
        browser.close()
    return posts[:max_posts]
Enter fullscreen mode Exit fullscreen mode

Method 3: Apify Actor (Easiest)

The Threads Profile Scraper handles session management and proxy rotation automatically.

import requests, time

run = requests.post(
    "https://api.apify.com/v2/acts/lanky_quantifier~threads-profile-scraper/runs",
    headers={"Authorization": "Bearer YOUR_TOKEN"},
    json={"usernames": ["zuck", "mosseri"], "maxPostsPerProfile": 50}
).json()["data"]

while True:
    status = requests.get(
        f"https://api.apify.com/v2/actor-runs/{run['id']}",
        headers={"Authorization": "Bearer YOUR_TOKEN"}
    ).json()["data"]["status"]
    if status in ("SUCCEEDED", "FAILED"): break
    time.sleep(5)

results = requests.get(
    f"https://api.apify.com/v2/actor-runs/{run['id']}/dataset/items",
    headers={"Authorization": "Bearer YOUR_TOKEN"}
).json()
for p in results:
    print(f"@{p['username']}: {p['follower_count']:,} followers")
Enter fullscreen mode Exit fullscreen mode

Cost: ~$0.50 per 1,000 profiles scraped.

Rate Limits and Anti-Detection

Behavior Risk Level
>100 requests/min without session 🔴 Block
Datacenter IPs 🟡 Medium
Residential IPs with session 🟢 Low
Mobile user-agent + session 🟢 Low

Use mobile user-agents (Threads is mobile-first) and residential proxies for sustained scraping.

Use Cases

  • Influencer research: Track follower growth before partnerships
  • Competitor monitoring: Watch brand accounts for content strategy changes
  • Trend detection: Find which topics drive highest engagement
  • Lead generation: Find professionals by keywords in their bio
  • Market research: Track public sentiment on product launches

The method depends on your scale. For one-time research, use the direct API approach. For ongoing monitoring of hundreds of profiles, use Apify's managed scraper to avoid session rotation headaches.


Save hours on scraping setup: The $29 Apify Scrapers Bundle includes 35+ production-ready actors — Google SERP, LinkedIn, Amazon, TikTok, contact info, and more. Pre-configured inputs, working on day one.

Get the Bundle ($29) →

Top comments (0)