DEV Community

ScrapeSmith
ScrapeSmith

Posted on

How to Scrape Instagram Comments at Scale — 1 Million Results and Counting

Instagram's official API is nearly useless if you need comment data at any real scale. Rate limits kick in almost immediately, OAuth setup is painful, and the data you actually get back is heavily filtered. For anyone doing sentiment analysis, influencer research, lead generation, or feeding an NLP pipeline — the official API just doesn't cut it.
I've been running an Instagram comments scraper on Apify that has now processed over 1 million comments across hundreds of posts and reels. This article covers how it works, what the data looks like, and how to use it in your own Python pipeline.

What People Actually Use This For
Before getting into the technical side, here are the real use cases that show up most:
Sentiment analysis — feeding large comment datasets into NLP models to understand audience reactions to a brand, product launch, or campaign. You need thousands of comments to get statistically meaningful results. The official API won't get you there.
Influencer vetting — before signing a sponsorship deal, brands want to know if an influencer's engagement is genuine. Fake engagement leaves patterns in comment data: repetitive text, suspiciously high like ratios, bot-like usernames. You can only catch this with bulk data.
Competitor research — analysing what people are saying in the comments of a competitor's top posts reveals customer pain points, feature requests, and sentiment gaps you can act on.
Lead generation — people comment on posts asking where to buy something, requesting recommendations, or expressing a specific need. Scraping comments from relevant posts surfaces these warm leads at scale.
Academic and social research — studying language patterns, community behaviour, and discourse on social platforms. Instagram comment datasets are used in published NLP and sociology research.
AI training data — clean, structured conversational text in multiple languages for fine-tuning language models.

The Data You Get
Here is exactly what the scraper returns for each comment:
json{
"postId": "3627347799613702778",
"postUrl": "https://www.instagram.com/p/DJW7ZrwxQJ6/",
"commentId": "17856813987670336",
"commentUrl": "https://www.instagram.com/p/DJW7ZrwxQJ6/c/17856813987670336/",
"text": "This is exactly what I was looking for",
"timestamp": 1772885404,
"likesCount": 14,
"userId": "63583756795",
"username": "example_user",
"userFullName": "Example User",
"ownerProfilePicUrl": "https://scontent.cdninstagram.com/...",
"isVerified": false,
"repliesCount": 2,
"replies": [
{
"commentId": "17856813987670337",
"text": "Same here!",
"username": "another_user",
"likesCount": 3,
"timestamp": 1772885500
}
]
}
Every comment includes the full text, timestamp, like count, commenter username, verified status, and nested replies. No preprocessing needed — the output is clean and ready for ingestion.

Getting Started in Python
The scraper runs on Apify. You need a free account — every account gets $5 in monthly credits which is enough to pull around 10,000 comments to test with.
Install the Apify client:
bashpip install apify-client
Basic run — scrape comments from a single post:
pythonfrom apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_API_TOKEN")

run_input = {
"postUrls": [
"https://www.instagram.com/p/YOUR_POST_ID/"
],
"maxCommentsPerPost": 500,
"sortOrder": "popular"
}

run = client.actor("scrapesmith/instagram-comments-scraper").call(run_input=run_input)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["username"], "|", item["text"][:80])

Bulk Scraping — Multiple Posts at Once
For research or competitive analysis you usually need data from many posts simultaneously. Pass a list of URLs:
pythonfrom apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_API_TOKEN")

post_urls = [
"https://www.instagram.com/p/POST_ID_1/",
"https://www.instagram.com/p/POST_ID_2/",
"https://www.instagram.com/p/POST_ID_3/",
# add as many as needed
]

run_input = {
"postUrls": post_urls,
"maxCommentsPerPost": 1000,
"sortOrder": "newest"
}

run = client.actor("scrapesmith/instagram-comments-scraper").call(run_input=run_input)

dataset = client.dataset(run["defaultDatasetId"])
comments = list(dataset.iterate_items())

print(f"Total comments scraped: {len(comments)}")
At $0.50 per 1,000 results, scraping 100,000 comments costs $50. For the volume of data you get back, that is significantly cheaper than any alternative.

No code required.

Practical Notes from Running at Scale
A few things worth knowing before you start a large run:
Target posts with at least 10 comments. Posts with very few comments or heavily restricted accounts may return empty datasets.
Use sortOrder: "popular" for quality, "newest" for recency. Popular sort gives you the highest-engagement comments first which is better for sentiment analysis. Newest sort is better if you're monitoring for recent activity.
Replies are nested inside comments. The replies array is included by default. If you're flattening the data for analysis, remember to extract replies separately.
No login or cookies needed. The scraper works entirely on public data. You don't need an Instagram account, session token, or proxy setup.

Pricing
Pay-per-result at $0.50 per 1,000 comments. You only pay for data you actually receive — no subscription, no minimum. Every new Apify account gets $5 in free monthly credits to test with.
VolumeCost10,000 comments$5.00100,000 comments$50.001,000,000 comments$500.00

Try It
The scraper is live on the Apify Store:
Instagram Comments Scraper — scrapesmith/instagram-comments-scraper
My full actor catalogue including YouTube Shorts, Instagram Hashtag, Google Maps Reviews, and more is at apify.com/scrapesmith.
If you need a custom scraper built for a specific site or workflow, reach out via the Apify profile.

Built and maintained by Scrape Smith. 2.1K+ users, 99%+ run success rate.

Top comments (0)