How to Scrape Instagram Without Getting Banned in 2026 (3 Working Methods)

#webscraping #instagram #python #automation

Instagram bans scrapers faster than almost any other platform. Their bot detection has gotten dramatically better since 2024 — traditional Selenium approaches fail within minutes.

Here are the 3 methods that actually work, tested in production as of April 2026.

Why Instagram Is Hard to Scrape

Instagram uses several layers of bot detection:

Rate limiting — too many requests from one IP = block
Fingerprinting — browser/device consistency checks
Account-based restrictions — logged-in sessions have rate limits
Graph API deprecation — the old Basic Display API was shut down

You can't just use requests.get() on Instagram URLs and parse the HTML. The data is loaded dynamically, and anonymous browsing is heavily restricted.

Method 1: Apify Instagram Scraper (Recommended)

Success rate: ~100% for public profiles

Cost: ~$0.004 per profile

Setup time: 10 minutes

This is the most reliable production-ready option. Apify's Instagram Profile Scraper (apify.com/store) uses residential proxies and rotating sessions to avoid detection.

What it extracts:

Follower/following counts
Bio, website URL
Post count
Recent posts (captions, likes, comments, date)
Profile metadata

Input:

{
  "usernames": ["nationalgeographic", "nasa", "nike"],
  "resultsLimit": 50
}

Output:

{
  "username": "nationalgeographic",
  "followersCount": 280000000,
  "followsCount": 176,
  "postsCount": 35847,
  "biography": "...",
  "externalUrl": "https://www.nationalgeographic.com",
  "isVerified": true
}

Limitations:

Public profiles only (no private account data)
Post details limited to public content
Not suitable for real-time monitoring at high frequency

Method 2: Playwright with Residential Proxies

Success rate: 60-80% depending on proxy quality

Cost: $8-15/GB residential proxy + compute

Setup time: 2-4 hours

If you need more control or want to run on your own infrastructure:

from playwright.async_api import async_playwright
import asyncio

async def scrape_instagram_profile(username: str, proxy: dict):
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            proxy=proxy  # {"server": "...", "username": "...", "password": "..."}
        )
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15",
            viewport={"width": 390, "height": 844},
            device_scale_factor=3,
            is_mobile=True,
            has_touch=True
        )

        page = await context.new_page()

        # Use mobile endpoint — less aggressive bot detection
        await page.goto(f"https://www.instagram.com/{username}/", 
                       wait_until="networkidle",
                       timeout=30000)

        # Extract from page source
        content = await page.content()
        # Parse the __data variable that Instagram embeds
        # ...

        await browser.close()

Critical details that make this work:

Use mobile user-agents — Instagram's mobile detection is weaker
Use residential proxies, never datacenter
Add realistic timing delays (2-5 seconds between requests)
Rotate proxies every 10-15 requests minimum

Method 3: Official Instagram Graph API (Limited)

Success rate: 100% but restricted data

Cost: Free (within rate limits)

What you can access: Only your OWN account data or Business accounts that grant you permission

The Graph API still works but it's not for scraping competitor data:

import requests

# Only works for your own connected accounts
ACCESS_TOKEN = "your_access_token"
USER_ID = "your_user_id"

response = requests.get(
    f"https://graph.instagram.com/{USER_ID}/media",
    params={
        "fields": "id,media_type,timestamp,like_count,comments_count",
        "access_token": ACCESS_TOKEN
    }
)

Use this only for: Analyzing your own Instagram account metrics, managing content for accounts you own.

What NOT to Do

Avoid these common mistakes:

❌ Scrapy with residential proxies — Instagram detects Scrapy's request patterns almost immediately.

❌ Logged-in automation without proper fingerprinting — Logging into an Instagram account via automation gets that account banned within hours.

❌ High-frequency requests — Even with perfect proxies, requesting more than 5-10 profiles/minute from one IP causes detection.

❌ Direct mobile API calls — Instagram's internal API (/api/v1/users/{pk}/info/) requires valid session tokens that expire quickly.

Real-World Use Cases

Influencer research: Validate follower counts and engagement rates before a partnership. At $0.004/profile, you can check 1,000 influencer profiles for $4 vs paying an influencer marketing platform $200/month.

Competitor monitoring: Track follower growth rate for 20-30 competitors weekly. Monthly cost: ~$3.

Brand monitoring: Check if your brand is being mentioned in public posts/stories on profiles you care about.

Market research: Analyze what content performs in your niche by scraping public profiles and ranking posts by engagement.

Summary: Which Method to Use

Use Case	Method	Why
One-off research	Apify actor	Easiest, no infrastructure
Daily monitoring	Apify scheduled run	Set-and-forget, reliable
High volume (10k+/day)	Playwright + proxies	More control, lower per-unit cost
Own account data	Official API	Free, within ToS

For most teams: start with Apify. At $0.004/profile, the cost is negligible vs developer time spent maintaining a custom scraper.

The Apify Scrapers Bundle includes the Instagram Profile Scraper along with 34 other production actors — Google SERP, Amazon, LinkedIn, TikTok Shop, contact info. Pre-configured inputs so you're running in 10 minutes.

$29 one-time purchase →