DEV Community

Vhub Systems
Vhub Systems

Posted on

How to Scrape Instagram Profile Data Without the Graph API in 2026

Instagram's official Graph API requires a business account and Facebook review to get meaningful access. For research, competitive analysis, or lead generation, here's what works without the API.

What's publicly accessible

Instagram public profiles expose:

  • Username, full name, bio, website
  • Post count, followers, following
  • Profile picture URL
  • Recent posts (thumbnails, captions, like/comment counts)
  • Story highlights (titles only)

What requires auth: stories, DMs, private accounts, detailed post insights.

Method 1: Instagram's public data endpoints

Instagram has JSON endpoints that return profile data without authentication:

import requests

def get_instagram_profile(username: str) -> dict:
    url = f"https://www.instagram.com/{username}/?__a=1&__d=dis"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/122.0.0.0 Safari/537.36",
        "Accept": "application/json",
        "X-IG-App-ID": "936619743392459",
    }

    session = requests.Session()
    # First get cookies
    session.get("https://www.instagram.com/", headers={"User-Agent": headers["User-Agent"]})

    response = session.get(url, headers=headers)
    if response.status_code == 200:
        try:
            data = response.json()
            user = data.get("graphql", {}).get("user", {})
            return {
                "username": user.get("username"),
                "full_name": user.get("full_name"),
                "bio": user.get("biography"),
                "followers": user.get("edge_followed_by", {}).get("count"),
                "following": user.get("edge_follow", {}).get("count"),
                "posts": user.get("edge_owner_to_timeline_media", {}).get("count"),
                "is_verified": user.get("is_verified"),
                "website": user.get("external_url"),
            }
        except Exception:
            pass
    return {}
Enter fullscreen mode Exit fullscreen mode

Note: Instagram rotates and breaks this endpoint. If you get 401/403, rotate user agents or add delays.

Method 2: Playwright with session warm-up

For more reliable extraction, use a headless browser with a proper session:

from playwright.async_api import async_playwright
import asyncio

async def scrape_instagram_profile(username: str) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            args=["--disable-blink-features=AutomationControlled", "--no-sandbox"]
        )

        context = await browser.new_context(
            user_agent="Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15",
            viewport={"width": 390, "height": 844},
            # Mobile UA gets simpler page structure
        )

        page = await context.new_page()

        # Intercept API calls to capture JSON response
        profile_data = {}

        async def handle_route(route, request):
            if "api/v1/users/web_profile_info" in request.url:
                response = await route.fetch()
                try:
                    data = await response.json()
                    user = data.get("data", {}).get("user", {})
                    profile_data.update({
                        "username": user.get("username"),
                        "followers": user.get("edge_followed_by", {}).get("count"),
                        "following": user.get("edge_follow", {}).get("count"),
                        "bio": user.get("biography"),
                    })
                except Exception:
                    pass
                await route.continue_()
            else:
                await route.continue_()

        await page.route("**/*", handle_route)
        await page.goto(f"https://www.instagram.com/{username}/")
        await page.wait_for_timeout(3000)

        await browser.close()
        return profile_data

profile = asyncio.run(scrape_instagram_profile("natgeo"))
print(profile)
Enter fullscreen mode Exit fullscreen mode

Method 3: Public search without auth

Instagram's Explore page and hashtag search return some profile data without login:

import requests

def search_instagram_users(query: str) -> list:
    url = "https://www.instagram.com/web/search/topsearch/"
    params = {
        "context": "blended",
        "query": query,
        "rank_token": "0.0",
    }
    headers = {"User-Agent": "Mozilla/5.0", "X-Requested-With": "XMLHttpRequest"}

    session = requests.Session()
    session.get("https://www.instagram.com/")  # Get initial cookies

    response = session.get(url, params=params, headers=headers)
    if response.status_code == 200:
        data = response.json()
        return [
            {
                "username": u.get("user", {}).get("username"),
                "full_name": u.get("user", {}).get("full_name"),
                "followers": u.get("user", {}).get("follower_count"),
            }
            for u in data.get("users", [])
        ]
    return []

# Find influencers in a niche
users = search_instagram_users("web scraping developer")
print(users[:5])
Enter fullscreen mode Exit fullscreen mode

Method 4: Pre-built actor (recommended for scale)

The Instagram Profile Scraper on Apify handles auth rotation, proxy management, and rate limiting. Input usernames or profile URLs.

Sample output:

{
  "username": "natgeo",
  "fullName": "National Geographic",
  "bio": "Experience the world through the eyes of National Geographic photographers.",
  "followersCount": 279000000,
  "followingCount": 143,
  "postsCount": 30500,
  "isVerified": true,
  "website": "https://www.nationalgeographic.com",
  "profilePicUrl": "https://..."
}
Enter fullscreen mode Exit fullscreen mode

107+ production runs. Pay-per-result pricing.

Anti-detection tips

Instagram's rate limits are aggressive:

  • Max ~50 profile requests per session before getting soft-blocked
  • Use residential proxies for bulk extraction
  • Add 2-5 second delays between requests
  • Rotate sessions (new browser context) every 20-30 requests
  • Mobile user agents tend to work better than desktop

Use cases

  • Influencer discovery: find accounts in a niche by follower range + bio keywords
  • Brand monitoring: track competitor account growth
  • Lead generation: identify Instagram-active potential customers
  • Market research: analyze posting frequency + engagement patterns

n8n AI Automation Pack ($39) — 5 production-ready workflows

Pre-built and maintained

Skip the extraction layer entirely:

Apify Scrapers Bundle — $29 one-time

35+ scrapers including Instagram, Twitter, LinkedIn, Amazon, and more.

Top comments (0)