DEV Community

Mirfa Zainab
Mirfa Zainab

Posted on

How to Scrape Comments, Likes, or Other Interactions from Instagram

Scraping post interactions (comments, likes, replies, timestamps, user handles) lets you analyze engagement quality, sentiment, and growth. Below is a practical, safe-by-design approach you can implement with the reference code in this repo:
https://github.com/Instagram-Automations/instagram-post-scraper

Ground Rules (Read First)

Respect terms & law: Always follow Instagram’s Terms of Use and local laws. Prefer first-party APIs where possible.

Be transparent: If you process user data, disclose it and store only what you need.

Throttle requests: Human-like pacing reduces the chance of blocks.

For an implementation baseline, review the examples in the repo: instagram-post-scraper
.

Approaches (Choose Based on Your Needs)
1) Official/Partner APIs (Safest)

For Business/Creator accounts you control, use Meta’s Graph APIs to fetch comments and replies reliably.

Pros: Stable, policy-compliant, structured data.

Cons: Limited to authorized assets; no public-wide coverage.

2) Lightweight HTML Parsing (Fastest Setup)

For public posts, fetch the post page, extract embedded JSON (sharedData / GraphQL) and parse:

Post metadata: id, shortcode, caption, owner

Edges: comment text, user, timestamp, likes on comments

Add pagination by following end_cursor tokens.

See pagination patterns in the repo’s utilities: GitHub code
.

3) Headless Browser (Most Robust)

Use Playwright/Puppeteer to render dynamic content and infinite-scroll the comments/likers drawer.

Rotate proxies, set viewport & language, and inject think-time between actions.

Ideal when HTML endpoints change or require user interaction to reveal more items.
Reference flows: example scripts in the repo
.

What to Capture (Minimal Clean Schema)

Post

post_id, shortcode, owner_username, caption, taken_at, like_count, comment_count

Comment

comment_id, post_id, author_username, text, created_at, like_count, parent_id (for replies)

Liker

post_id, username, fetched_at

The repo includes patterns for mapping these fields cleanly:
https://github.com/Instagram-Automations/instagram-post-scraper

Anti-Block & Reliability Checklist

Rotate IPs/ASN: Residential/DC proxies with pool rotation.

Session hygiene: Reuse authenticated sessions cautiously; refresh cookies when needed.

Human cadence: Randomized delays, typed-like interactions in headless runs.

Retry/backoff: Exponential backoff on 429/403; circuit-break on repeated errors.

Pagination guards: Stop when has_next_page=false or duplicate edge IDs appear.

Deduplicate: Use (post_id, comment_id) or (post_id, username, created_at) as keys.

Basic Workflow (Step-by-Step)

Input: Provide a shortcode or full post URL.

Fetch: Request the post page or open it in a headless session.

Extract: Parse embedded JSON for edges (comments/likers).

Paginate: Follow end_cursor until exhausted or you reach a limit.

Normalize: Map fields to the schema above; strip emojis/HTML safely.

Store: Save to SQLite/Postgres/CSV with unique indices.

Monitor: Log rate limits, error codes, and cursor positions to resume.

You’ll find working patterns for steps 2–6 inside:
instagram-post-scraper (GitHub)

Tips for Comments vs. Likers

Comments: Often available via GraphQL edges; replies nest under threaded_comments. Ensure recursion depth and parent IDs are handled.

Likers: Typically revealed after clicking the “likes” count/modal. With headless browsing, scroll the modal and harvest batches until no new entries are found.

Quick Win Ideas

Sentiment & keywords: Run basic NLP on comments to rank posts by audience mood.

Creator vetting: Cross-reference commenters with follower counts to spot engaged micro-influencers.

Campaign QA: Compare like/comment velocity before/after promotions.

Final Note & CTA

You can adapt and ship a production-ready interaction scraper by starting from the examples and patterns here:
https://github.com/Instagram-Automations/instagram-post-scraper

Explore the code, copy a starter, and extend it for your stack—pagination, dedupe, and anti-block are already outlined. Dive in: instagram-post-scraper on GitHub
.

Top comments (0)