Instagram bans scrapers faster than almost any other platform. Their bot detection has gotten dramatically better since 2024 — traditional Selenium approaches fail within minutes.
Here are the 3 methods that actually work, tested in production as of April 2026.
Why Instagram Is Hard to Scrape
Instagram uses several layers of bot detection:
- Rate limiting — too many requests from one IP = block
- Fingerprinting — browser/device consistency checks
- Account-based restrictions — logged-in sessions have rate limits
- Graph API deprecation — the old Basic Display API was shut down
You can't just use requests.get() on Instagram URLs and parse the HTML. The data is loaded dynamically, and anonymous browsing is heavily restricted.
Method 1: Apify Instagram Scraper (Recommended)
Success rate: ~100% for public profiles
Cost: ~$0.004 per profile
Setup time: 10 minutes
This is the most reliable production-ready option. Apify's Instagram Profile Scraper (apify.com/store) uses residential proxies and rotating sessions to avoid detection.
What it extracts:
- Follower/following counts
- Bio, website URL
- Post count
- Recent posts (captions, likes, comments, date)
- Profile metadata
Input:
{
"usernames": ["nationalgeographic", "nasa", "nike"],
"resultsLimit": 50
}
Output:
{
"username": "nationalgeographic",
"followersCount": 280000000,
"followsCount": 176,
"postsCount": 35847,
"biography": "...",
"externalUrl": "https://www.nationalgeographic.com",
"isVerified": true
}
Limitations:
- Public profiles only (no private account data)
- Post details limited to public content
- Not suitable for real-time monitoring at high frequency
Method 2: Playwright with Residential Proxies
Success rate: 60-80% depending on proxy quality
Cost: $8-15/GB residential proxy + compute
Setup time: 2-4 hours
If you need more control or want to run on your own infrastructure:
from playwright.async_api import async_playwright
import asyncio
async def scrape_instagram_profile(username: str, proxy: dict):
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
proxy=proxy # {"server": "...", "username": "...", "password": "..."}
)
context = await browser.new_context(
user_agent="Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15",
viewport={"width": 390, "height": 844},
device_scale_factor=3,
is_mobile=True,
has_touch=True
)
page = await context.new_page()
# Use mobile endpoint — less aggressive bot detection
await page.goto(f"https://www.instagram.com/{username}/",
wait_until="networkidle",
timeout=30000)
# Extract from page source
content = await page.content()
# Parse the __data variable that Instagram embeds
# ...
await browser.close()
Critical details that make this work:
- Use mobile user-agents — Instagram's mobile detection is weaker
- Use residential proxies, never datacenter
- Add realistic timing delays (2-5 seconds between requests)
- Rotate proxies every 10-15 requests minimum
Method 3: Official Instagram Graph API (Limited)
Success rate: 100% but restricted data
Cost: Free (within rate limits)
What you can access: Only your OWN account data or Business accounts that grant you permission
The Graph API still works but it's not for scraping competitor data:
import requests
# Only works for your own connected accounts
ACCESS_TOKEN = "your_access_token"
USER_ID = "your_user_id"
response = requests.get(
f"https://graph.instagram.com/{USER_ID}/media",
params={
"fields": "id,media_type,timestamp,like_count,comments_count",
"access_token": ACCESS_TOKEN
}
)
Use this only for: Analyzing your own Instagram account metrics, managing content for accounts you own.
What NOT to Do
Avoid these common mistakes:
❌ Scrapy with residential proxies — Instagram detects Scrapy's request patterns almost immediately.
❌ Logged-in automation without proper fingerprinting — Logging into an Instagram account via automation gets that account banned within hours.
❌ High-frequency requests — Even with perfect proxies, requesting more than 5-10 profiles/minute from one IP causes detection.
❌ Direct mobile API calls — Instagram's internal API (/api/v1/users/{pk}/info/) requires valid session tokens that expire quickly.
Real-World Use Cases
Influencer research: Validate follower counts and engagement rates before a partnership. At $0.004/profile, you can check 1,000 influencer profiles for $4 vs paying an influencer marketing platform $200/month.
Competitor monitoring: Track follower growth rate for 20-30 competitors weekly. Monthly cost: ~$3.
Brand monitoring: Check if your brand is being mentioned in public posts/stories on profiles you care about.
Market research: Analyze what content performs in your niche by scraping public profiles and ranking posts by engagement.
Summary: Which Method to Use
| Use Case | Method | Why |
|---|---|---|
| One-off research | Apify actor | Easiest, no infrastructure |
| Daily monitoring | Apify scheduled run | Set-and-forget, reliable |
| High volume (10k+/day) | Playwright + proxies | More control, lower per-unit cost |
| Own account data | Official API | Free, within ToS |
For most teams: start with Apify. At $0.004/profile, the cost is negligible vs developer time spent maintaining a custom scraper.
The Apify Scrapers Bundle includes the Instagram Profile Scraper along with 34 other production actors — Google SERP, Amazon, LinkedIn, TikTok Shop, contact info. Pre-configured inputs so you're running in 10 minutes.
Ready-to-Use Instagram Scraper
For bulk Instagram scraping without managing sessions:
Top comments (0)