Olamide Olaniyan

Posted on Mar 6

Build a LinkedIn B2B Lead Scraper (Extract Commenters from Viral Posts)

#programming #webdev #tutorial #ai

In B2B sales, cold outreach is getting harder. Response rates are dropping because everyone is using the same automated tools to spam the same lists.

The secret to high-converting B2B outreach is intent.

If someone comments on a viral LinkedIn post about "The struggles of managing AWS infrastructure," and you sell an AWS management tool—that person is a warm lead.

In this tutorial, I'll show you how to build a Python script that takes a viral LinkedIn post URL, extracts every person who commented on it, and exports their profile data to a CSV for highly targeted outreach.

The Problem with LinkedIn Scraping

LinkedIn has the most aggressive anti-scraping measures on the internet. If you try to use Selenium or BeautifulSoup, your account will be restricted or permanently banned within hours.

To do this safely, we will use the SociaVault API. It handles all the proxy rotation, headless browser management, and CAPTCHA solving on the backend. You just make a simple API call.

Prerequisites

Python 3.8+
requests library
pandas library
A SociaVault API key (Get 1,000 free credits at sociavault.com)

pip install requests pandas

Step 1: The Script Setup

Create a file called linkedin_scraper.py.

import requests
import pandas as pd
import time

API_KEY = 'your_sociavault_api_key'
BASE_URL = 'https://api.sociavault.com/v1/linkedin'

headers = {
    'Authorization': f'Bearer {API_KEY}',
    'Content-Type': 'application/json'
}

Step 2: Extracting the Post ID

LinkedIn URLs look like this:
https://www.linkedin.com/posts/username_this-is-the-post-title-activity-7165432109876543210-AbCd

The actual Post ID is the 19-digit number (7165432109876543210). Let's write a quick helper to extract it.

import re

def extract_post_id(url):
    match = re.search(r'activity-(\d+)-', url)
    if match:
        return match.group(1)

    # Handle alternative URL formats
    match = re.search(r'urn:li:activity:(\d+)', url)
    if match:
        return match.group(1)

    raise ValueError("Could not find Post ID in URL")

Step 3: Fetching the Commenters

Now we'll hit the SociaVault endpoint to get the comments for that specific post.

def get_post_commenters(post_id, max_comments=100):
    print(f"Fetching comments for post {post_id}...")

    leads = []

    try:
        response = requests.get(
            f"{BASE_URL}/post/comments",
            headers=headers,
            params={
                'post_id': post_id,
                'limit': max_comments
            }
        )

        if response.status_code == 200:
            comments = response.json().get('data', [])

            for comment in comments:
                author = comment.get('author', {})

                # We only want real people, not company pages
                if author.get('type') == 'USER':
                    leads.append({
                        'Full Name': author.get('name'),
                        'Headline': author.get('headline'), # e.g., "CTO at TechCorp"
                        'Profile URL': author.get('profile_url'),
                        'Comment Text': comment.get('text'),
                        'Engagement': comment.get('likes_count')
                    })

            return leads
        else:
            print(f"API Error: {response.text}")
            return []

    except Exception as e:
        print(f"Request failed: {e}")
        return []

Step 4: Filtering and Exporting

We don't just want a raw list. We want to filter out people who left low-value comments like "CFBR" (Commenting for better reach) or "Following". We want people who actually engaged with the topic.

def process_leads(leads, output_filename):
    if not leads:
        print("No leads found.")
        return

    df = pd.DataFrame(leads)

    # Remove duplicates (if someone commented twice)
    df = df.drop_duplicates(subset=['Profile URL'])

    # Filter out low-value comments (less than 10 characters)
    df = df[df['Comment Text'].str.len() > 10]

    # Filter out common engagement pod phrases
    spam_phrases = ['cfbr', 'following', 'great post', 'agree']
    df = df[~df['Comment Text'].str.lower().isin(spam_phrases)]

    print(f"\nExtracted {len(df)} high-quality leads!")

    # Show a preview
    for index, row in df.head(3).iterrows():
        print("-" * 50)
        print(f"Name: {row['Full Name']}")
        print(f"Headline: {row['Headline']}")
        print(f"Comment: {row['Comment Text'][:100]}...")

    # Export to CSV
    df.to_csv(output_filename, index=False)
    print(f"\nSaved leads to {output_filename}")

# Run the script
if __name__ == "__main__":
    # Example viral post URL
    target_url = "https://www.linkedin.com/posts/example_post-activity-7165432109876543210-AbCd"

    try:
        post_id = extract_post_id(target_url)
        leads = get_post_commenters(post_id, max_comments=200)
        process_leads(leads, "linkedin_warm_leads.csv")
    except Exception as e:
        print(f"Error: {e}")

The Outreach Strategy

Now you have a CSV file named linkedin_warm_leads.csv containing the names, job titles, and profile URLs of people who engaged with a specific topic.

Instead of sending a generic cold message, you can send a highly personalized connection request:

"Hey [Name], saw your comment on [Author]'s post about AWS infrastructure. I completely agree with your point about [Reference their comment]. We actually built a tool that solves exactly that. Open to connecting?"

This approach routinely sees 40-50% acceptance rates, compared to the 5% average of standard cold outreach.

Scale Your Lead Gen

If you want to scale this, you can use SociaVault to automate the entire pipeline. You can search for posts by keyword, extract the commenters, and feed them directly into your CRM via API.

Get your free API key at SociaVault.com and start building your intent-based lead machine today.

DEV Community