DEV Community

agenthustler
agenthustler

Posted on

How to Scrape LinkedIn Connections for Network Analysis

How to Scrape LinkedIn Connections for Network Analysis

Understanding your professional network can reveal hidden patterns — clusters of industry contacts, potential introductions, and career trajectory insights. In this tutorial, we'll build a Python tool that analyzes LinkedIn connection data for network analysis.

The Approach: Export + Enrich

LinkedIn lets you export your own connections as CSV via Settings > Data Privacy > Get a copy of your data. We'll parse that export and enrich it with publicly available data.

Setting Up

pip install requests pandas networkx matplotlib
Enter fullscreen mode Exit fullscreen mode

Sign up for ScraperAPI to handle request routing when enriching profile data at scale.

Parsing Your LinkedIn Export

import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

def load_connections(csv_path):
    df = pd.read_csv(csv_path, skiprows=3)
    df.columns = [c.strip() for c in df.columns]
    df["Connected On"] = pd.to_datetime(df["Connected On"])
    return df

df = load_connections("Connections.csv")
print(f"Total connections: {len(df)}")
print(f"Top companies:\n{df['Company'].value_counts().head(10)}")
Enter fullscreen mode Exit fullscreen mode

Enriching Profiles with Public Data

Use ThorData residential proxies for reliable enrichment:

import requests
import time

SCRAPER_API_KEY = "YOUR_SCRAPERAPI_KEY"

def enrich_profile(name, company):
    query = f"{name} {company} site:linkedin.com/in"
    params = {
        "api_key": SCRAPER_API_KEY,
        "url": f"https://www.google.com/search?q={query}",
        "render": "false"
    }
    resp = requests.get("https://api.scraperapi.com", params=params)
    time.sleep(1.5)
    return resp.text

for _, row in df.head(5).iterrows():
    html = enrich_profile(
        row["First Name"] + " " + row["Last Name"],
        row["Company"]
    )
    print(f"Fetched data for {row['First Name']} ({len(html)} chars)")
Enter fullscreen mode Exit fullscreen mode

Building the Network Graph

Group connections by company and create a co-affiliation network:

def build_network(df):
    G = nx.Graph()
    companies = df.groupby("Company")
    for company, group in companies:
        if len(group) < 2 or pd.isna(company):
            continue
        people = group["First Name"].tolist()
        for i, p1 in enumerate(people):
            for p2 in people[i+1:]:
                G.add_edge(p1, p2, company=company)
    return G

G = build_network(df)
print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")

degrees = sorted(G.degree(), key=lambda x: x[1], reverse=True)
for name, deg in degrees[:10]:
    print(f"  {name}: {deg} shared affiliations")
Enter fullscreen mode Exit fullscreen mode

Visualizing Clusters

plt.figure(figsize=(14, 10))
pos = nx.spring_layout(G, k=0.5, seed=42)
nx.draw_networkx(
    G, pos,
    node_size=[G.degree(n) * 50 for n in G.nodes()],
    font_size=7, alpha=0.8, edge_color="#cccccc"
)
plt.title("LinkedIn Network: Co-Company Affiliations")
plt.savefig("network.png", dpi=150)
Enter fullscreen mode Exit fullscreen mode

Temporal Growth Analysis

df["month"] = df["Connected On"].dt.to_period("M")
growth = df.groupby("month").size().cumsum()
growth.plot(figsize=(10, 4), title="Network Growth Over Time")
plt.ylabel("Total Connections")
plt.savefig("growth.png")
Enter fullscreen mode Exit fullscreen mode

Scaling Up

For monitoring large enrichment jobs, ScrapeOps provides dashboards that track success rates across thousands of requests.

Key Takeaways

  • LinkedIn's CSV export gives you structured connection data to analyze
  • NetworkX reveals hidden clusters and bridge connectors
  • Proxy services like ScraperAPI make enrichment reliable
  • Temporal analysis shows networking momentum and patterns

Your professional network is a dataset waiting to be explored. Start with the export, build the graph, and discover the patterns.


This tutorial uses your own LinkedIn data export. Always respect platform terms of service and rate limits.

Top comments (0)