DEV Community

George Kioko
George Kioko

Posted on

I Built an AI Powered Influencer Finder That Costs Almost Nothing to Run

#ai

Most influencer discovery tools charge $200-500/month. I built one that costs me cheap to run and finds real influencer profiles with names, follower counts, bios, and emails across Instagram, TikTok, and YouTube.

Here's exactly how it works, what broke along the way, and the architecture that finally made it reliable.

The Problem

A brand asked me to find 50 fitness micro-influencers on Instagram with contact info. The options were:

  • Upfluence: $478/month minimum
  • Modash: $299/month
  • Manual research: 3 hours on Instagram, copy-pasting into a spreadsheet

I figured I could automate this for pennies.

The Architecture (What Actually Works)

After three failed approaches, here's what stuck:

Google SERP search (via Apify GOOGLE_SERP proxy)
  -> Extract social profile URLs from search results
    -> HTTP fetch each profile (via Apify residential proxy + Googlebot UA)
      -> Parse OG meta tags for real names, follower counts, bios
        -> Output structured data
Enter fullscreen mode Exit fullscreen mode

The key insight: you don't need to render Instagram pages in a browser. Instagram serves complete Open Graph meta tags to Googlebot. A simple HTTP GET with the right User-Agent through a residential proxy returns everything you need.

For example, fetching https://www.instagram.com/kayla_itsines/ with a Googlebot header returns:

og:title: "KAYLA ITSINES (@kayla_itsines). Instagram photos and videos"
og:description: "16M Followers, 845 Following, 8,977 Posts"
Enter fullscreen mode Exit fullscreen mode

Real name, follower count, post count. No browser. No login. No CAPTCHA.

What Broke (And How I Fixed It)

Attempt 1: Puppeteer + Apify Proxy

Used PuppeteerCrawler to search Google and visit profiles. Google CAPTCHA'd me. Instagram detected headless Chrome. Got 0 results.

Attempt 2: crawl4ai on VPS (direct IP)

Deployed crawl4ai (real Chromium) on a cheap Contabo VPS. Worked for normal sites but Google and Instagram both blocked the datacenter IP. 0 results again.

Attempt 3: crawl4ai + Apify proxy pipeline

The fix: route crawl4ai's browser traffic through Apify's proxy pool.

  • Google searches go through GOOGLE_SERP proxy group (designed for Google)
  • Instagram profile fetches go through RESIDENTIAL proxy group (residential IPs)
  • Use a lightweight HTTP fetch endpoint (no browser needed for profile pages)

This is what finally worked consistently.

The Gemma 4 Enhancement

The VPS also runs Google's Gemma 4 (2B parameter model) via Ollama. When the regex-based profile extraction from SERP results misses something, Gemma acts as an intelligent fallback:

"Given these Google search results, extract all Instagram profile URLs, 
usernames, display names, and follower counts. Return JSON."
Enter fullscreen mode Exit fullscreen mode

With think: false (disabling chain-of-thought reasoning), Gemma responds in 3-5 seconds instead of 60. For simple classification tasks, the thinking overhead isn't worth it.

Real Results

Running "beauty" niche on Instagram, 5 results requested:

Username Real Name Followers Source
@mikaylajmakeup Mikayla Jane Nogueira 3M og_meta_enriched
@ericataylor2347 Erica Taylor 2M og_meta_enriched
@darcybylauren lauren janelle 189K og_meta_enriched
@amandaensing Amanda Ensing 1M og_meta_enriched
@jamiegenevieve Jamie Genevieve 1M og_meta_enriched

All real names (not just handles), all real follower counts, all in about 2 minutes.

Cost Breakdown

Component Monthly Cost
Contabo VPS (6 vCPU, 12GB RAM) Under $15
Apify Creator Plan $1
Apify proxy usage ~$2-5 per 1000 searches
Total ~$11-14/month

Compare that to $200-500/month for commercial influencer tools.

The Code

The full source is on GitHub: influencer-marketing-intel

Or try it directly on Apify (no code needed): Influencer Marketing Intelligence

Input:

{
  "niche": "beauty",
  "platforms": ["instagram", "tiktok", "youtube"],
  "maxResults": 50,
  "followerRange": "micro_10k_100k"
}
Enter fullscreen mode Exit fullscreen mode

Output: structured JSON with username, displayName, estimatedFollowers, bio, contactEmails, nicheTags, profileUrl for each influencer found.

What I'd Do Differently

  1. Start with the OG meta approach from day one. I wasted weeks trying to make Puppeteer work on Instagram. The Googlebot UA trick was the breakthrough.

  2. Don't fight anti-bot systems, route around them. Residential proxies cost pennies and save hours of debugging.

  3. Local LLMs for extraction are underrated. Gemma 4 on a VPS replaces brittle regex patterns. When Instagram changes their HTML structure, Gemma adapts. Regex doesn't.


I build scraping tools 57 actors on Apify Store, 869 users. If you have a data problem that needs automating, I probably already built the tool.

Follow the build log: @ai_in_it on X

Top comments (0)