DEV Community

Fatih İlhan
Fatih İlhan

Posted on

How I Built Instagram Intelligence Suite for IG Growth

Instagram research usually breaks down in the same three places:

  1. You find profiles, but you still do manual qualification.
  2. You know a brand or influencer is active, but you cannot track story behavior cleanly.
  3. You see high-intent comments under posts and reels, but nobody turns them into structured leads.

That is exactly why I split the workflow into three focused APIs instead of trying to force everything into one oversized scraper:

  • IGLead for profile qualification
  • IG_story_snapshot for story activity monitoring
  • IG_comment_lead for comment-to-lead extraction

All three are built around Apify actors, but the bigger idea is simple: treat Instagram intelligence like a pipeline, not a one-off scrape.

Why I split this into three APIs

A lot of Instagram tools try to do discovery, enrichment, monitoring, and lead scoring in one place. That sounds convenient until the inputs, auth requirements, and output formats start fighting each other.

I wanted each API to answer one clean question:

  • IGLead: Is this profile worth contacting?
  • IG_story_snapshot: Is this profile active on stories right now?
  • IG_comment_lead: Which commenters look like real demand?

That separation makes the stack easier to maintain, easier to schedule, and easier to plug into downstream automations.

1. IGLead: qualifying influencer and creator profiles before outreach

IGLead starts with a list of Instagram usernames or profile URLs and turns them into scored outreach candidates.

Instead of just scraping follower counts, it combines multiple signals:

  • follower count
  • recent post engagement
  • engagement rate
  • business email detection from public bio text
  • niche keyword matching
  • verification status
  • a final lead score and recommendation

One thing I especially like in this API is that the scoring is not flat. Engagement expectations change depending on account size. A micro creator should not be evaluated like a mega influencer, so the actor adjusts its thresholds by tier.

It also uses multiple extraction paths for reliability:

  • Instagram web profile API
  • feed endpoints when timeline media is incomplete
  • HTML parsing fallbacks
  • meta tag parsing for follower and post counts

That matters because Instagram is rarely stable enough for a single-method scraper.

Example IGLead input

{
  "profiles": ["therock", "cristiano", "https://www.instagram.com/kyliejenner/"],
  "sessionId": "YOUR_SESSION_ID",
  "minFollowers": 100000,
  "minEngagementRate": 1.0,
  "requireBusinessEmail": false,
  "nicheKeywords": ["fitness", "health", "wellness"],
  "maxProfilesPerRun": 50,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"]
  }
}
Enter fullscreen mode Exit fullscreen mode

What comes back

For each profile, IGLead can return:

  • normalized profile data
  • recent post stats
  • average and median engagement metrics
  • extracted business email if it is publicly visible in the bio
  • niche match score
  • leadScore
  • recommendation such as contact, review, or skip

For outreach teams, that means you can stop treating every Instagram profile as equal. You can prioritize the ones that actually fit your campaign and have the engagement to justify the spend.

2. IG_story_snapshot: monitoring story activity without owning the account

Stories are one of the hardest parts of Instagram to operationalize because they are temporary, fast-moving, and usually checked manually.

IG_story_snapshot is built to answer a very specific operational question:

Does this public profile have an active story right now, and what does that story set look like?

It tracks:

  • whether a profile currently has an active story
  • how many story frames are live
  • image vs video composition
  • oldest and newest story timestamps
  • hours since the story sequence started
  • hours left until expiry
  • optional profile context such as follower count

This is useful for:

  • competitor monitoring
  • campaign verification
  • event coverage tracking
  • brand activity benchmarking

What I like most here is that it avoids trying to do too much. It does not pretend to give story view counts, and it does not download story content. It focuses on presence and metadata, which is the part most teams actually need for monitoring.

Example IG_story_snapshot input

{
  "profiles": ["nike", "adidas", "@puma"],
  "sessionId": "YOUR_SESSION_ID",
  "includeProfileContext": true,
  "maxProfilesPerRun": 50,
  "maxRequestsPerMinute": 15,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
Enter fullscreen mode Exit fullscreen mode

What comes back

The output is centered around a few operational fields:

  • active_story
  • story_count
  • story_frames
  • story_metadata
  • story_age_hours
  • hours_left_until_expiry

That makes the API especially useful for scheduled runs. If you execute it every hour, you can build a clean timeline of who is posting stories, how often they post, and whether they lean more toward video or image content.

3. IG_comment_lead: turning Instagram comments into lead intelligence

IG_comment_lead is the most directly sales-oriented API in the stack.

The idea is straightforward: people reveal intent in comments all the time. They ask about price, shipping, details, availability, or how to order. Most teams read those comments manually, if they read them at all.

This API takes Instagram post or reel URLs, fetches comments, and scores commenters based on lead relevance.

The pipeline includes:

  • input validation for post and reel URLs
  • authenticated scraping with sessionId
  • fallback comment extraction strategies
  • keyword-based intent scoring
  • lightweight sentiment scoring
  • spam checks
  • deduplication by username
  • early stopping once the target lead count is reached
  • a final analytics summary for the whole run

I also like that this API is optimized for cost control. You can cap comments per post, define a minimum lead score, and stop the run as soon as enough leads are found.

Example IG_comment_lead input

{
  "postUrls": [
    "https://www.instagram.com/p/C3xYz1234Ab/",
    "https://www.instagram.com/reel/C3xYz5678Cd/"
  ],
  "sessionId": "YOUR_SESSION_ID",
  "cookie": "sessionid=...; csrftoken=...; ds_user_id=...;",
  "maxCommentsPerPost": 500,
  "targetLeads": 30,
  "minLeadScore": 0.6,
  "debugComments": false
}
Enter fullscreen mode Exit fullscreen mode

What makes this one interesting

The comment fetch flow does not rely on a single endpoint. It tries multiple strategies:

  • GraphQL queries
  • shortcode-based REST endpoints
  • mobile-style REST endpoints

That gives it a better chance of surviving endpoint instability.

On top of extraction, it enriches leads with:

  • buyer_intent_score
  • engagement_score
  • likely customer flag
  • extracted keywords
  • inferred niche
  • inferred geography

At the end of a run, it also pushes an analytics summary with:

  • total comments processed
  • total leads found
  • lead rate
  • top commenters
  • intent distribution
  • sentiment distribution
  • top keywords
  • per-post breakdown

That means the output is useful for both direct lead capture and campaign analysis.

How the three APIs work together

The fun part is not each API in isolation. It is the workflow they create together.

A practical sequence looks like this:

  1. Use IGLead to qualify creators, influencers, or niche accounts before outreach.
  2. Use IG_story_snapshot to monitor who is actively posting stories right now.
  3. Use IG_comment_lead on posts and reels in your niche to surface warm demand from commenters.

That gives you three different layers of Instagram intelligence:

  • profile quality
  • current activity
  • audience intent

In other words, you can answer:

  • Who should I contact?
  • Who is active right now?
  • Who is already asking buying questions?

Implementation notes

All three projects are built around the same practical philosophy:

  • use Apify actors for deployment and scheduling
  • use Crawlee for request orchestration
  • use Playwright when browser context is needed
  • keep concurrency controlled to reduce blocks
  • use fallback strategies instead of trusting one endpoint
  • support session cookies when Instagram requires authentication
  • preserve debug artifacts when extraction fails

For IGLead, that means debug HTML and screenshots when profile parsing breaks.

For IG_story_snapshot, that means API-first story detection with a visual fallback for story presence.

For IG_comment_lead, that means endpoint fallback plus a summary record at the end so a run is not just raw data, but something closer to decision-ready output.

What I would improve next

If I keep iterating on this stack, these are the next areas I would push:

  • cross-actor orchestration so leads can move automatically from one API to the next
  • historical storage for story activity trends over weeks instead of single snapshots
  • richer commenter enrichment for repeat engagement across multiple posts
  • better dashboarding on top of the analytics summary

The core scraping part is useful, but the real leverage comes from building a repeatable operating system around it.

Final thoughts

The biggest lesson from building IGLead, IG_story_snapshot, and IG_comment_lead is that Instagram automation becomes much more useful when you stop thinking in terms of "scrape a page" and start thinking in terms of "answer a business question."

Each API here is narrow on purpose:

  • IGLead answers qualification
  • IG_story_snapshot answers activity
  • IG_comment_lead answers intent

Put together, they form a lightweight Instagram intelligence stack for outreach, competitor research, and lead generation.

If you are building in this space, I would strongly recommend resisting the urge to turn everything into one monolith. Small, composable APIs are easier to trust, easier to debug, and much easier to turn into real workflows.

You can find all of my APIs here: https://apify.com/store/categories?search=seralifatih

Top comments (0)