Instagram research usually breaks down in the same three places:
- You find profiles, but you still do manual qualification.
- You know a brand or influencer is active, but you cannot track story behavior cleanly.
- You see high-intent comments under posts and reels, but nobody turns them into structured leads.
That is exactly why I split the workflow into three focused APIs instead of trying to force everything into one oversized scraper:
-
IGLeadfor profile qualification -
IG_story_snapshotfor story activity monitoring -
IG_comment_leadfor comment-to-lead extraction
All three are built around Apify actors, but the bigger idea is simple: treat Instagram intelligence like a pipeline, not a one-off scrape.
Why I split this into three APIs
A lot of Instagram tools try to do discovery, enrichment, monitoring, and lead scoring in one place. That sounds convenient until the inputs, auth requirements, and output formats start fighting each other.
I wanted each API to answer one clean question:
-
IGLead: Is this profile worth contacting? -
IG_story_snapshot: Is this profile active on stories right now? -
IG_comment_lead: Which commenters look like real demand?
That separation makes the stack easier to maintain, easier to schedule, and easier to plug into downstream automations.
1. IGLead: qualifying influencer and creator profiles before outreach
IGLead starts with a list of Instagram usernames or profile URLs and turns them into scored outreach candidates.
Instead of just scraping follower counts, it combines multiple signals:
- follower count
- recent post engagement
- engagement rate
- business email detection from public bio text
- niche keyword matching
- verification status
- a final lead score and recommendation
One thing I especially like in this API is that the scoring is not flat. Engagement expectations change depending on account size. A micro creator should not be evaluated like a mega influencer, so the actor adjusts its thresholds by tier.
It also uses multiple extraction paths for reliability:
- Instagram web profile API
- feed endpoints when timeline media is incomplete
- HTML parsing fallbacks
- meta tag parsing for follower and post counts
That matters because Instagram is rarely stable enough for a single-method scraper.
Example IGLead input
{
"profiles": ["therock", "cristiano", "https://www.instagram.com/kyliejenner/"],
"sessionId": "YOUR_SESSION_ID",
"minFollowers": 100000,
"minEngagementRate": 1.0,
"requireBusinessEmail": false,
"nicheKeywords": ["fitness", "health", "wellness"],
"maxProfilesPerRun": 50,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}
What comes back
For each profile, IGLead can return:
- normalized profile data
- recent post stats
- average and median engagement metrics
- extracted business email if it is publicly visible in the bio
- niche match score
leadScore- recommendation such as
contact,review, orskip
For outreach teams, that means you can stop treating every Instagram profile as equal. You can prioritize the ones that actually fit your campaign and have the engagement to justify the spend.
2. IG_story_snapshot: monitoring story activity without owning the account
Stories are one of the hardest parts of Instagram to operationalize because they are temporary, fast-moving, and usually checked manually.
IG_story_snapshot is built to answer a very specific operational question:
Does this public profile have an active story right now, and what does that story set look like?
It tracks:
- whether a profile currently has an active story
- how many story frames are live
- image vs video composition
- oldest and newest story timestamps
- hours since the story sequence started
- hours left until expiry
- optional profile context such as follower count
This is useful for:
- competitor monitoring
- campaign verification
- event coverage tracking
- brand activity benchmarking
What I like most here is that it avoids trying to do too much. It does not pretend to give story view counts, and it does not download story content. It focuses on presence and metadata, which is the part most teams actually need for monitoring.
Example IG_story_snapshot input
{
"profiles": ["nike", "adidas", "@puma"],
"sessionId": "YOUR_SESSION_ID",
"includeProfileContext": true,
"maxProfilesPerRun": 50,
"maxRequestsPerMinute": 15,
"proxyConfiguration": {
"useApifyProxy": true
}
}
What comes back
The output is centered around a few operational fields:
active_storystory_countstory_framesstory_metadatastory_age_hourshours_left_until_expiry
That makes the API especially useful for scheduled runs. If you execute it every hour, you can build a clean timeline of who is posting stories, how often they post, and whether they lean more toward video or image content.
3. IG_comment_lead: turning Instagram comments into lead intelligence
IG_comment_lead is the most directly sales-oriented API in the stack.
The idea is straightforward: people reveal intent in comments all the time. They ask about price, shipping, details, availability, or how to order. Most teams read those comments manually, if they read them at all.
This API takes Instagram post or reel URLs, fetches comments, and scores commenters based on lead relevance.
The pipeline includes:
- input validation for post and reel URLs
- authenticated scraping with
sessionId - fallback comment extraction strategies
- keyword-based intent scoring
- lightweight sentiment scoring
- spam checks
- deduplication by username
- early stopping once the target lead count is reached
- a final analytics summary for the whole run
I also like that this API is optimized for cost control. You can cap comments per post, define a minimum lead score, and stop the run as soon as enough leads are found.
Example IG_comment_lead input
{
"postUrls": [
"https://www.instagram.com/p/C3xYz1234Ab/",
"https://www.instagram.com/reel/C3xYz5678Cd/"
],
"sessionId": "YOUR_SESSION_ID",
"cookie": "sessionid=...; csrftoken=...; ds_user_id=...;",
"maxCommentsPerPost": 500,
"targetLeads": 30,
"minLeadScore": 0.6,
"debugComments": false
}
What makes this one interesting
The comment fetch flow does not rely on a single endpoint. It tries multiple strategies:
- GraphQL queries
- shortcode-based REST endpoints
- mobile-style REST endpoints
That gives it a better chance of surviving endpoint instability.
On top of extraction, it enriches leads with:
buyer_intent_scoreengagement_score- likely customer flag
- extracted keywords
- inferred niche
- inferred geography
At the end of a run, it also pushes an analytics summary with:
- total comments processed
- total leads found
- lead rate
- top commenters
- intent distribution
- sentiment distribution
- top keywords
- per-post breakdown
That means the output is useful for both direct lead capture and campaign analysis.
How the three APIs work together
The fun part is not each API in isolation. It is the workflow they create together.
A practical sequence looks like this:
- Use
IGLeadto qualify creators, influencers, or niche accounts before outreach. - Use
IG_story_snapshotto monitor who is actively posting stories right now. - Use
IG_comment_leadon posts and reels in your niche to surface warm demand from commenters.
That gives you three different layers of Instagram intelligence:
- profile quality
- current activity
- audience intent
In other words, you can answer:
- Who should I contact?
- Who is active right now?
- Who is already asking buying questions?
Implementation notes
All three projects are built around the same practical philosophy:
- use Apify actors for deployment and scheduling
- use Crawlee for request orchestration
- use Playwright when browser context is needed
- keep concurrency controlled to reduce blocks
- use fallback strategies instead of trusting one endpoint
- support session cookies when Instagram requires authentication
- preserve debug artifacts when extraction fails
For IGLead, that means debug HTML and screenshots when profile parsing breaks.
For IG_story_snapshot, that means API-first story detection with a visual fallback for story presence.
For IG_comment_lead, that means endpoint fallback plus a summary record at the end so a run is not just raw data, but something closer to decision-ready output.
What I would improve next
If I keep iterating on this stack, these are the next areas I would push:
- cross-actor orchestration so leads can move automatically from one API to the next
- historical storage for story activity trends over weeks instead of single snapshots
- richer commenter enrichment for repeat engagement across multiple posts
- better dashboarding on top of the analytics summary
The core scraping part is useful, but the real leverage comes from building a repeatable operating system around it.
Final thoughts
The biggest lesson from building IGLead, IG_story_snapshot, and IG_comment_lead is that Instagram automation becomes much more useful when you stop thinking in terms of "scrape a page" and start thinking in terms of "answer a business question."
Each API here is narrow on purpose:
-
IGLeadanswers qualification -
IG_story_snapshotanswers activity -
IG_comment_leadanswers intent
Put together, they form a lightweight Instagram intelligence stack for outreach, competitor research, and lead generation.
If you are building in this space, I would strongly recommend resisting the urge to turn everything into one monolith. Small, composable APIs are easier to trust, easier to debug, and much easier to turn into real workflows.
You can find all of my APIs here: https://apify.com/store/categories?search=seralifatih
Top comments (0)