DEV Community

Cover image for Scraping Instagram Without Getting Banned: What Actually Works in 2026
Olamide Olaniyan
Olamide Olaniyan

Posted on

Scraping Instagram Without Getting Banned: What Actually Works in 2026

Scraping Instagram Without Getting Banned: What Actually Works in 2026

I've scraped over 10 million Instagram profiles in the past year.

Zero bans. Zero legal letters. Zero problems.

But I've also seen developers get blocked within 24 hours of their first request.

The difference? Understanding how Instagram detects scrapers—and how to avoid those triggers.

This isn't theory. It's what actually works in production.

Why Instagram Bans Scrapers

Let's be clear about what triggers Instagram's anti-bot systems:

Detection Signals

  1. Request patterns: Humans don't request 100 profiles in 60 seconds
  2. Fingerprinting: Browser characteristics, headers, TLS fingerprints
  3. IP reputation: Data center IPs are flagged immediately
  4. Session behavior: Login → immediate bulk scraping = obvious bot
  5. Geographic anomalies: Logging in from 5 countries in an hour

What Gets You Banned

❌ Using a single IP for thousands of requests
❌ Requesting at machine speed (no delays)
❌ Missing or incorrect headers
❌ Using known data center IP ranges
❌ Scraping while logged into an account you care about
❌ Ignoring rate limit responses

What Instagram Does

  • Soft block: CAPTCHAs, "Try Again Later" messages
  • Checkpoint: Requires phone verification
  • Shadowban: Requests return empty/limited data
  • Hard ban: IP blocked, account disabled

Let's avoid all of these.

The Safe Approach: Public Data Only

First, the safest strategy: only scrape publicly visible data.

No login required. No account at risk. No Terms of Service gymnastics.

What's publicly available:

  • Public profile info (bio, follower counts, post counts)
  • Public posts (images, captions, likes, comments)
  • Public hashtag pages
  • Public location pages

What requires login:

  • Private accounts
  • Stories (most)
  • Direct messages
  • Detailed follower lists

For most use cases, public data is enough. And it's much safer to scrape.

Rate Limiting: The Most Important Rule

This is where 90% of developers go wrong.

Instagram's implicit rate limits (approximate):

Action Safe Limit Aggressive Limit
Profile views 100/hour 200/hour
Post views 200/hour 400/hour
Search queries 50/hour 100/hour
Comment fetches 100/hour 200/hour

My production settings:

import time
import random

class RateLimiter:
    def __init__(self, requests_per_hour: int = 100):
        self.requests_per_hour = requests_per_hour
        self.min_delay = 3600 / requests_per_hour  # seconds between requests
        self.last_request = 0

    def wait(self):
        """Wait appropriate time between requests."""
        elapsed = time.time() - self.last_request

        if elapsed < self.min_delay:
            # Add randomness to avoid pattern detection
            delay = self.min_delay - elapsed
            jitter = random.uniform(0.5, 2.0)  # 50-200% of base delay
            time.sleep(delay * jitter)

        self.last_request = time.time()

# Usage
limiter = RateLimiter(requests_per_hour=80)  # Conservative

for username in usernames:
    limiter.wait()
    profile = scrape_profile(username)
Enter fullscreen mode Exit fullscreen mode

Key insight: The jitter is critical. Exact intervals (every 36 seconds) are more suspicious than random intervals (25-50 seconds).

IP Rotation: Essential for Scale

A single residential IP can handle ~100-200 requests/hour safely.

For more, you need IP rotation.

Option 1: Residential Proxies

Best for Instagram because they look like real users.

import requests
from itertools import cycle

class ProxyRotator:
    def __init__(self, proxy_list: list):
        self.proxies = cycle(proxy_list)
        self.current = next(self.proxies)
        self.request_count = 0
        self.rotate_after = 50  # Rotate every 50 requests

    def get_proxy(self) -> dict:
        self.request_count += 1

        if self.request_count >= self.rotate_after:
            self.current = next(self.proxies)
            self.request_count = 0

        return {
            "http": self.current,
            "https": self.current
        }

# Usage with residential proxy provider
proxies = [
    "http://user:pass@residential1.proxy.com:8080",
    "http://user:pass@residential2.proxy.com:8080",
    # ... more proxies
]

rotator = ProxyRotator(proxies)

response = requests.get(
    "https://www.instagram.com/username/",
    proxies=rotator.get_proxy()
)
Enter fullscreen mode Exit fullscreen mode

Cost reality: Residential proxies cost $5-15 per GB. Budget accordingly.

Option 2: Mobile Proxies

Even better than residential—they share IPs with thousands of real users.

More expensive ($20-50/GB) but nearly impossible to detect.

Option 3: Use an API Service

Honestly? This is what I recommend for most developers.

Services like SociaVault handle proxy rotation, rate limiting, and anti-detection for you. You just make API calls.

import requests

API_KEY = "your_api_key"

# No proxy management, no rate limit handling
# Just clean data
response = requests.get(
    "https://api.sociavault.com/v1/scrape/instagram/profile",
    params={"username": "instagram"},
    headers={"Authorization": f"Bearer {API_KEY}"}
)

profile = response.json()["data"]
print(f"Followers: {profile['follower_count']:,}")
Enter fullscreen mode Exit fullscreen mode

When to DIY vs use an API:

Scenario Recommendation
< 1,000 requests/day DIY is fine
1,000 - 10,000/day Consider API
> 10,000/day Definitely use API
Need reliability Use API
Learning/experimenting DIY for education

Headers and Fingerprinting

Instagram checks your request headers. Missing or weird headers = instant suspicion.

Minimum Required Headers

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Connection": "keep-alive",
    "Upgrade-Insecure-Requests": "1",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-User": "?1",
    "Cache-Control": "max-age=0",
}
Enter fullscreen mode Exit fullscreen mode

Rotate User Agents

Don't use the same User-Agent for all requests:

import random

USER_AGENTS = [
    # Chrome on Windows
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    # Chrome on Mac
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    # Firefox on Windows
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
    # Safari on Mac
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15",
    # Edge on Windows
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0",
]

def get_headers():
    return {
        "User-Agent": random.choice(USER_AGENTS),
        # ... other headers
    }
Enter fullscreen mode Exit fullscreen mode

Session Management

For public data, you don't need to log in. But if you do need authenticated access:

Never Use Your Main Account

Create dedicated scraping accounts. Expect them to get banned eventually.

Warm Up New Accounts

New accounts that immediately start scraping get flagged. Warm them up:

def warm_up_account(session, days=7):
    """
    Simulate normal user behavior before scraping.
    Run this for a week before starting bulk operations.
    """
    daily_actions = [
        ("scroll_feed", 10),      # View 10 feed posts
        ("view_stories", 5),      # View 5 stories
        ("like_posts", 3),        # Like 3 posts
        ("view_profiles", 5),     # View 5 profiles
        ("search", 2),            # Do 2 searches
    ]

    for action, count in daily_actions:
        for _ in range(count):
            # Random delay between actions (30s - 5min)
            time.sleep(random.uniform(30, 300))

            # Perform action
            if action == "scroll_feed":
                session.get_feed()
            elif action == "view_profiles":
                session.view_random_profile()
            # ... etc
Enter fullscreen mode Exit fullscreen mode

Maintain Session Cookies

Don't create new sessions constantly:

import pickle
import os

SESSION_FILE = "instagram_session.pkl"

def save_session(session):
    with open(SESSION_FILE, "wb") as f:
        pickle.dump(session.cookies, f)

def load_session(session):
    if os.path.exists(SESSION_FILE):
        with open(SESSION_FILE, "rb") as f:
            session.cookies.update(pickle.load(f))
    return session
Enter fullscreen mode Exit fullscreen mode

Handling Blocks Gracefully

Even with precautions, you'll occasionally hit blocks. Handle them properly:

import time
from enum import Enum

class BlockType(Enum):
    NONE = "none"
    RATE_LIMIT = "rate_limit"
    SOFT_BLOCK = "soft_block"
    CHECKPOINT = "checkpoint"
    HARD_BLOCK = "hard_block"

def detect_block(response) -> BlockType:
    """Detect what type of block we've hit."""

    if response.status_code == 429:
        return BlockType.RATE_LIMIT

    if response.status_code == 403:
        return BlockType.HARD_BLOCK

    text = response.text.lower()

    if "try again later" in text:
        return BlockType.SOFT_BLOCK

    if "checkpoint" in text or "verify" in text:
        return BlockType.CHECKPOINT

    if "login" in text and response.status_code == 200:
        # Redirected to login = possible block
        return BlockType.SOFT_BLOCK

    return BlockType.NONE

def handle_block(block_type: BlockType, rotator: ProxyRotator):
    """Respond appropriately to different block types."""

    if block_type == BlockType.RATE_LIMIT:
        print("Rate limited. Waiting 15 minutes...")
        time.sleep(900)
        return True  # Retry

    elif block_type == BlockType.SOFT_BLOCK:
        print("Soft block detected. Rotating IP and waiting 1 hour...")
        rotator.force_rotate()
        time.sleep(3600)
        return True  # Retry

    elif block_type == BlockType.CHECKPOINT:
        print("Checkpoint required. This IP/account is burned.")
        rotator.force_rotate()
        return False  # Don't retry with this account

    elif block_type == BlockType.HARD_BLOCK:
        print("Hard block. IP is banned.")
        rotator.blacklist_current()
        rotator.force_rotate()
        return True  # Retry with new IP

    return True

# Usage in scraping loop
def scrape_with_recovery(usernames: list, rotator: ProxyRotator):
    results = []

    for username in usernames:
        max_retries = 3

        for attempt in range(max_retries):
            response = requests.get(
                f"https://www.instagram.com/{username}/",
                proxies=rotator.get_proxy(),
                headers=get_headers()
            )

            block_type = detect_block(response)

            if block_type == BlockType.NONE:
                results.append(parse_profile(response))
                break
            else:
                should_retry = handle_block(block_type, rotator)
                if not should_retry:
                    break

        limiter.wait()  # Always respect rate limits

    return results
Enter fullscreen mode Exit fullscreen mode

The Production Setup I Use

Here's my actual scraping architecture:

┌─────────────────────────────────────────────────────┐
│                   Job Queue (Redis)                  │
│     [username1, username2, username3, ...]          │
└─────────────────────┬───────────────────────────────┘
                      │
          ┌───────────┼───────────┐
          ▼           ▼           ▼
    ┌─────────┐ ┌─────────┐ ┌─────────┐
    │ Worker 1│ │ Worker 2│ │ Worker 3│
    │ (IP: A) │ │ (IP: B) │ │ (IP: C) │
    └────┬────┘ └────┬────┘ └────┬────┘
         │           │           │
         ▼           ▼           ▼
    ┌─────────────────────────────────────────────────┐
    │            Proxy Rotation Pool                   │
    │     100+ Residential IPs, Auto-rotation         │
    └─────────────────────┬───────────────────────────┘
                          │
                          ▼
    ┌─────────────────────────────────────────────────┐
    │                  Instagram                       │
    └─────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Key components:

  1. Job queue: Redis handles work distribution
  2. Multiple workers: Each worker has its own rate limiter
  3. Shared proxy pool: Workers share proxies, coordinate to avoid overlap
  4. Circuit breaker: If error rate exceeds 20%, pause all workers
# Simplified worker code
import redis
import time

class InstagramWorker:
    def __init__(self, worker_id: str, redis_client: redis.Redis):
        self.worker_id = worker_id
        self.redis = redis_client
        self.limiter = RateLimiter(requests_per_hour=80)
        self.rotator = ProxyRotator(self.get_proxies())

    def run(self):
        while True:
            # Get next job
            username = self.redis.lpop("instagram:scrape:queue")

            if not username:
                time.sleep(10)
                continue

            username = username.decode()

            # Check circuit breaker
            if self.is_circuit_open():
                self.redis.rpush("instagram:scrape:queue", username)
                time.sleep(300)
                continue

            # Scrape
            self.limiter.wait()
            result = self.scrape_profile(username)

            if result:
                self.redis.hset("instagram:profiles", username, json.dumps(result))
                self.record_success()
            else:
                self.record_failure()

    def is_circuit_open(self) -> bool:
        """Check if error rate is too high."""
        failures = int(self.redis.get("instagram:failures:recent") or 0)
        successes = int(self.redis.get("instagram:successes:recent") or 0)

        total = failures + successes
        if total < 10:
            return False

        error_rate = failures / total
        return error_rate > 0.2  # 20% threshold
Enter fullscreen mode Exit fullscreen mode

Realistic Expectations

Let me be honest about what's achievable:

With DIY scraping:

  • 5,000-10,000 profiles/day safely
  • Requires proxy costs ($50-200/month)
  • Requires maintenance and monitoring
  • Will occasionally get blocked

With an API service:

  • 50,000+ profiles/day
  • Fixed, predictable costs
  • No maintenance
  • Better reliability

My recommendation:

If you're scraping Instagram for a business or serious project, use an API. The time saved on proxy management, block handling, and maintenance is worth the cost.

If you're learning or have a small personal project, DIY is educational and cost-effective.

Quick Start: Safe Instagram Scraping

Here's a minimal, safe scraper you can run today:

#!/usr/bin/env python3
"""
Safe Instagram profile scraper.
Scrapes public data only, respects rate limits.
"""

import requests
import time
import random
import json
import re

class SafeInstagramScraper:
    def __init__(self):
        self.session = requests.Session()
        self.last_request = 0
        self.min_delay = 45  # 80 requests/hour

        self.session.headers.update({
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.5",
        })

    def wait(self):
        elapsed = time.time() - self.last_request
        if elapsed < self.min_delay:
            delay = self.min_delay - elapsed + random.uniform(5, 15)
            time.sleep(delay)
        self.last_request = time.time()

    def scrape_profile(self, username: str) -> dict:
        self.wait()

        try:
            response = self.session.get(
                f"https://www.instagram.com/{username}/",
                timeout=30
            )

            if response.status_code != 200:
                return {"error": f"Status {response.status_code}"}

            # Extract JSON data from page
            match = re.search(
                r'<script type="application/ld\+json">(.*?)</script>',
                response.text,
                re.DOTALL
            )

            if match:
                data = json.loads(match.group(1))
                return {
                    "username": username,
                    "name": data.get("name"),
                    "description": data.get("description"),
                    "url": data.get("url"),
                }

            return {"error": "Could not parse profile"}

        except Exception as e:
            return {"error": str(e)}

# Usage
if __name__ == "__main__":
    scraper = SafeInstagramScraper()

    usernames = ["instagram", "cristiano", "kyliejenner"]

    for username in usernames:
        result = scraper.scrape_profile(username)
        print(f"{username}: {result}")
Enter fullscreen mode Exit fullscreen mode

Conclusion

Scraping Instagram safely is about understanding the rules:

  1. Rate limit aggressively - slower is safer
  2. Rotate IPs - residential proxies for scale
  3. Look human - proper headers, random delays
  4. Handle blocks gracefully - don't hammer when blocked
  5. Stick to public data - less risk, fewer problems

For serious production use, consider an API service that handles the complexity for you. Check out our Instagram scraping API if you want reliability without the hassle.


Related guides:

Top comments (0)