How to Scrape LinkedIn Connections for Network Analysis
Understanding your professional network can reveal hidden patterns — clusters of industry contacts, potential introductions, and career trajectory insights. In this tutorial, we'll build a Python tool that analyzes LinkedIn connection data for network analysis.
The Approach: Export + Enrich
LinkedIn lets you export your own connections as CSV via Settings > Data Privacy > Get a copy of your data. We'll parse that export and enrich it with publicly available data.
Setting Up
pip install requests pandas networkx matplotlib
Sign up for ScraperAPI to handle request routing when enriching profile data at scale.
Parsing Your LinkedIn Export
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
def load_connections(csv_path):
df = pd.read_csv(csv_path, skiprows=3)
df.columns = [c.strip() for c in df.columns]
df["Connected On"] = pd.to_datetime(df["Connected On"])
return df
df = load_connections("Connections.csv")
print(f"Total connections: {len(df)}")
print(f"Top companies:\n{df['Company'].value_counts().head(10)}")
Enriching Profiles with Public Data
Use ThorData residential proxies for reliable enrichment:
import requests
import time
SCRAPER_API_KEY = "YOUR_SCRAPERAPI_KEY"
def enrich_profile(name, company):
query = f"{name} {company} site:linkedin.com/in"
params = {
"api_key": SCRAPER_API_KEY,
"url": f"https://www.google.com/search?q={query}",
"render": "false"
}
resp = requests.get("https://api.scraperapi.com", params=params)
time.sleep(1.5)
return resp.text
for _, row in df.head(5).iterrows():
html = enrich_profile(
row["First Name"] + " " + row["Last Name"],
row["Company"]
)
print(f"Fetched data for {row['First Name']} ({len(html)} chars)")
Building the Network Graph
Group connections by company and create a co-affiliation network:
def build_network(df):
G = nx.Graph()
companies = df.groupby("Company")
for company, group in companies:
if len(group) < 2 or pd.isna(company):
continue
people = group["First Name"].tolist()
for i, p1 in enumerate(people):
for p2 in people[i+1:]:
G.add_edge(p1, p2, company=company)
return G
G = build_network(df)
print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")
degrees = sorted(G.degree(), key=lambda x: x[1], reverse=True)
for name, deg in degrees[:10]:
print(f" {name}: {deg} shared affiliations")
Visualizing Clusters
plt.figure(figsize=(14, 10))
pos = nx.spring_layout(G, k=0.5, seed=42)
nx.draw_networkx(
G, pos,
node_size=[G.degree(n) * 50 for n in G.nodes()],
font_size=7, alpha=0.8, edge_color="#cccccc"
)
plt.title("LinkedIn Network: Co-Company Affiliations")
plt.savefig("network.png", dpi=150)
Temporal Growth Analysis
df["month"] = df["Connected On"].dt.to_period("M")
growth = df.groupby("month").size().cumsum()
growth.plot(figsize=(10, 4), title="Network Growth Over Time")
plt.ylabel("Total Connections")
plt.savefig("growth.png")
Scaling Up
For monitoring large enrichment jobs, ScrapeOps provides dashboards that track success rates across thousands of requests.
Key Takeaways
- LinkedIn's CSV export gives you structured connection data to analyze
- NetworkX reveals hidden clusters and bridge connectors
- Proxy services like ScraperAPI make enrichment reliable
- Temporal analysis shows networking momentum and patterns
Your professional network is a dataset waiting to be explored. Start with the export, build the graph, and discover the patterns.
This tutorial uses your own LinkedIn data export. Always respect platform terms of service and rate limits.
Top comments (0)