Compensation transparency has become one of the most powerful forces reshaping the tech job market. Platforms like Levels.fyi sit at the center of this revolution, aggregating verified salary data from thousands of tech professionals across companies like Google, Meta, Amazon, Apple, Netflix, and hundreds more.
Whether you're a recruiter benchmarking compensation packages, a job seeker evaluating an offer, a startup founder setting salary bands, or a data analyst studying compensation trends — extracting structured data from Levels.fyi can unlock insights that are otherwise buried behind countless web pages.
In this guide, we'll break down how Levels.fyi organizes its data, walk through practical scraping approaches using both Node.js and Python, and show how to scale your extraction using Apify's cloud scraping platform.
Understanding How Levels.fyi Structures Its Data
Before writing a single line of scraping code, you need to understand the data architecture of the site you're targeting. Levels.fyi organizes compensation data across several key dimensions:
Company Profiles
Each company has a dedicated page showing aggregated compensation data. For a company like Google, you'll find:
- Total compensation by level (L3, L4, L5, L6, L7, etc.)
- Compensation breakdowns: base salary, stock/equity, bonus
- Historical trends: how pay has changed year over year
- Location-based adjustments: Bay Area vs. Seattle vs. New York vs. remote
Role and Level Taxonomy
Levels.fyi maintains a proprietary leveling system that maps different company titles to equivalent levels. For example:
| Company | Title | Levels.fyi Level |
|---|---|---|
| L5 Senior SWE | Senior | |
| Meta | E5 | Senior |
| Amazon | SDE II | Mid-Senior |
| Apple | ICT4 | Senior |
This normalization is what makes cross-company comparison possible and is a key part of the data's value.
Compensation Components
Every data point on Levels.fyi includes a detailed breakdown:
- Base Salary: Annual fixed compensation
- Stock/Equity: RSUs, stock options, or equity grants (annualized)
- Bonus: Annual performance bonus, signing bonus (sometimes amortized)
- Total Compensation (TC): The sum of all components
Geographic Data
Salary figures are tagged with location information, which is critical because a $200K TC in San Francisco has very different purchasing power than $200K in Austin, Texas.
The Technical Challenge of Scraping Levels.fyi
Levels.fyi is a modern React-based single-page application (SPA). This means:
- Data is loaded dynamically via API calls, not rendered in the initial HTML
- Content requires JavaScript execution to appear in the DOM
- Pagination and filtering happen client-side
- Rate limiting and bot detection are in place
This makes simple HTTP request-based scraping insufficient. You need either:
- A headless browser (Puppeteer, Playwright) to render the JavaScript
- API endpoint interception to capture the underlying data requests
Approach 1: Headless Browser Scraping with Node.js
Let's start with a Puppeteer-based approach that navigates the site and extracts compensation data:
const puppeteer = require('puppeteer');
async function scrapeLevelsFyi(company) {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
// Set a realistic user agent
await page.setUserAgent(
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) ' +
'AppleWebKit/537.36 (KHTML, like Gecko) ' +
'Chrome/120.0.0.0 Safari/537.36'
);
// Navigate to the company's compensation page
const url = `https://www.levels.fyi/companies/${company}/salaries`;
await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
// Wait for salary data to load
await page.waitForSelector('[class*="salary"]', { timeout: 15000 })
.catch(() => console.log('Salary selector not found, trying alternatives...'));
// Extract compensation data from the page
const salaryData = await page.evaluate(() => {
const results = [];
const rows = document.querySelectorAll('tr, [class*="row"]');
rows.forEach(row => {
const cells = row.querySelectorAll('td, [class*="cell"]');
if (cells.length >= 4) {
results.push({
level: cells[0]?.textContent?.trim(),
title: cells[1]?.textContent?.trim(),
totalComp: cells[2]?.textContent?.trim(),
base: cells[3]?.textContent?.trim(),
stock: cells[4]?.textContent?.trim() || null,
bonus: cells[5]?.textContent?.trim() || null,
});
}
});
return results;
});
console.log(`Found ${salaryData.length} salary entries for ${company}`);
await browser.close();
return salaryData;
}
// Extract data for multiple companies
async function scrapeMultipleCompanies(companies) {
const allData = {};
for (const company of companies) {
console.log(`Scraping ${company}...`);
allData[company] = await scrapeLevelsFyi(company);
// Respectful delay between requests
await new Promise(resolve => setTimeout(resolve, 3000));
}
return allData;
}
// Usage
const companies = ['google', 'meta', 'amazon', 'apple', 'microsoft'];
scrapeMultipleCompanies(companies)
.then(data => {
const fs = require('fs');
fs.writeFileSync('salary_data.json', JSON.stringify(data, null, 2));
console.log('Data saved to salary_data.json');
})
.catch(console.error);
Intercepting API Calls for Cleaner Data
A more efficient approach is to intercept the network requests that Levels.fyi's frontend makes to its backend API:
const puppeteer = require('puppeteer');
async function interceptSalaryAPI(company) {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
const apiResponses = [];
// Listen for API responses containing salary data
page.on('response', async (response) => {
const url = response.url();
if (url.includes('/api/') && url.includes('salaries')) {
try {
const data = await response.json();
apiResponses.push({
url: url,
data: data,
timestamp: new Date().toISOString()
});
} catch (e) {
// Not a JSON response, skip
}
}
});
await page.goto(
`https://www.levels.fyi/companies/${company}/salaries`,
{ waitUntil: 'networkidle2' }
);
// Scroll to trigger lazy-loaded content
await autoScroll(page);
await browser.close();
return apiResponses;
}
async function autoScroll(page) {
await page.evaluate(async () => {
await new Promise((resolve) => {
let totalHeight = 0;
const distance = 300;
const timer = setInterval(() => {
window.scrollBy(0, distance);
totalHeight += distance;
if (totalHeight >= document.body.scrollHeight) {
clearInterval(timer);
resolve();
}
}, 200);
});
});
}
Approach 2: Python-Based Extraction
If Python is more your style, here's an equivalent approach using Playwright:
import asyncio
import json
from playwright.async_api import async_playwright
async def scrape_levels_fyi(company: str) -> list[dict]:
"""Scrape salary data for a specific company from Levels.fyi."""
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
user_agent=(
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
)
page = await context.new_page()
api_data = []
# Intercept API responses
async def handle_response(response):
if "/api/" in response.url and "salaries" in response.url:
try:
data = await response.json()
api_data.append(data)
except Exception:
pass
page.on("response", handle_response)
url = f"https://www.levels.fyi/companies/{company}/salaries"
await page.goto(url, wait_until="networkidle")
# Scroll through the page to load all data
previous_height = 0
while True:
current_height = await page.evaluate("document.body.scrollHeight")
if current_height == previous_height:
break
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await asyncio.sleep(1)
previous_height = current_height
await browser.close()
return api_data
async def build_compensation_report(companies: list[str]):
"""Build a comprehensive compensation report across companies."""
report = {}
for company in companies:
print(f"Extracting data for {company}...")
data = await scrape_levels_fyi(company)
report[company] = data
await asyncio.sleep(2) # Respectful delay
return report
# Run the scraper
companies = ["google", "meta", "amazon", "apple", "netflix"]
report = asyncio.run(build_compensation_report(companies))
with open("compensation_report.json", "w") as f:
json.dump(report, f, indent=2)
print(f"Report saved with data for {len(report)} companies")
Processing and Analyzing the Data
Once you have raw data, you'll want to structure it for analysis:
import pandas as pd
import json
def process_salary_data(raw_data: dict) -> pd.DataFrame:
"""Transform raw salary data into a structured DataFrame."""
records = []
for company, entries in raw_data.items():
for entry in entries:
if isinstance(entry, dict):
records.append({
"company": company,
"level": entry.get("level", "Unknown"),
"title": entry.get("title", ""),
"base_salary": parse_currency(entry.get("baseSalary", 0)),
"stock_grant": parse_currency(entry.get("stockGrantValue", 0)),
"bonus": parse_currency(entry.get("bonus", 0)),
"total_comp": parse_currency(entry.get("totalCompensation", 0)),
"location": entry.get("location", "Unknown"),
"years_experience": entry.get("yearsExperience", None),
"years_at_company": entry.get("yearsAtCompany", None),
})
df = pd.DataFrame(records)
# Calculate derived metrics
df["equity_percentage"] = (df["stock_grant"] / df["total_comp"] * 100).round(1)
df["base_percentage"] = (df["base_salary"] / df["total_comp"] * 100).round(1)
return df
def parse_currency(value) -> float:
"""Parse currency strings into float values."""
if isinstance(value, (int, float)):
return float(value)
if isinstance(value, str):
cleaned = value.replace("$", "").replace(",", "").replace("K", "000")
try:
return float(cleaned)
except ValueError:
return 0.0
return 0.0
# Generate comparison report
def generate_comparison(df: pd.DataFrame):
"""Generate cross-company compensation comparison."""
summary = df.groupby(["company", "level"]).agg({
"total_comp": ["mean", "median", "min", "max", "count"],
"base_salary": "mean",
"stock_grant": "mean",
"bonus": "mean",
}).round(0)
print("\n=== Compensation Comparison by Company and Level ===")
print(summary.to_string())
return summary
Scaling with Apify: Cloud-Based Scraping Infrastructure
While local scripts work for small-scale extraction, real-world salary research requires scraping hundreds of companies and thousands of data points. This is where Apify excels.
Apify provides a cloud-based platform for running web scrapers (called "Actors") at scale. Here's how to use it for Levels.fyi data extraction:
Using Apify's Web Scraper Actor
// Apify Actor for Levels.fyi salary extraction
const Apify = require('apify');
Apify.main(async () => {
const input = await Apify.getInput();
const { companies = ['google'], maxResults = 100 } = input;
const requestQueue = await Apify.openRequestQueue();
const dataset = await Apify.openDataset();
// Queue company pages
for (const company of companies) {
await requestQueue.addRequest({
url: `https://www.levels.fyi/companies/${company}/salaries`,
userData: { company }
});
}
const crawler = new Apify.PuppeteerCrawler({
requestQueue,
maxConcurrency: 3,
navigationTimeoutSecs: 60,
handlePageFunction: async ({ request, page }) => {
const { company } = request.userData;
// Wait for data to load
await page.waitForTimeout(5000);
// Extract salary table data
const salaries = await page.evaluate(() => {
const data = [];
// Extract from rendered compensation tables
document.querySelectorAll('[data-testid*="salary"], .salary-row')
.forEach(el => {
const text = el.textContent;
data.push({ rawText: text });
});
return data;
});
// Store results
for (const salary of salaries) {
await dataset.pushData({
company,
...salary,
scrapedAt: new Date().toISOString(),
sourceUrl: request.url,
});
}
console.log(
`Extracted ${salaries.length} salary entries for ${company}`
);
},
handleFailedRequestFunction: async ({ request, error }) => {
console.error(`Failed: ${request.url} - ${error.message}`);
},
});
await crawler.run();
console.log('Scraping complete!');
});
Calling an Apify Actor from Python
You can also trigger Apify Actors programmatically from Python:
from apify_client import ApifyClient
import json
def run_levels_scraper(companies: list[str], api_token: str) -> list[dict]:
"""Run Levels.fyi scraper on Apify cloud."""
client = ApifyClient(api_token)
# Start the actor run
run = client.actor("your-username/levels-fyi-scraper").call(
run_input={
"companies": companies,
"maxResults": 500,
"includeHistorical": True,
},
timeout_secs=300,
)
# Fetch results from the dataset
results = []
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
results.append(item)
print(f"Retrieved {len(results)} salary entries")
return results
# Run the scraper
companies = [
"google", "meta", "amazon", "apple", "microsoft",
"netflix", "uber", "airbnb", "stripe", "coinbase"
]
results = run_levels_scraper(companies, "your_apify_api_token")
# Save results
with open("levels_fyi_data.json", "w") as f:
json.dump(results, f, indent=2)
Practical Use Cases for Levels.fyi Data
1. Compensation Benchmarking for Recruiters
Recruiters can build real-time compensation benchmarks by scraping salary data across competing companies. This data helps craft competitive offers and reduces the back-and-forth in salary negotiations.
2. Career Planning and Offer Evaluation
Job seekers can extract data for their target company and level to understand whether an offer is at the 25th, 50th, or 75th percentile. Cross-referencing with location data helps account for cost-of-living differences.
3. Market Research for Startups
Startup founders setting compensation bands can use Levels.fyi data to understand what Big Tech pays at equivalent levels, then decide how much to offset with equity versus cash.
4. Academic and Economic Research
Researchers studying wage inequality, the gender pay gap, or the impact of remote work on compensation can build longitudinal datasets by scraping historical data points.
5. Investment Analysis
Understanding compensation trends at specific companies can provide signals about talent retention, hiring velocity, and overall company health — all relevant to investment decisions.
Handling Common Challenges
Dynamic Content Loading
Levels.fyi loads data asynchronously. Always use waitUntil: 'networkidle2' and add explicit waits for data elements:
// Wait for specific data elements before extracting
await page.waitForFunction(() => {
const rows = document.querySelectorAll('[class*="compensation"]');
return rows.length > 0;
}, { timeout: 15000 });
Anti-Bot Measures
Rotate user agents, add random delays between requests, and consider using residential proxies for large-scale scraping. Apify's proxy infrastructure handles this automatically.
Data Quality
Not all entries on Levels.fyi are verified. Build validation into your pipeline:
def validate_salary_entry(entry: dict) -> bool:
"""Basic validation for salary data quality."""
tc = entry.get("total_comp", 0)
base = entry.get("base_salary", 0)
# Filter obvious outliers
if tc < 30000 or tc > 2000000:
return False
if base < 20000 or base > 500000:
return False
if base > tc:
return False
return True
Ethical Considerations and Best Practices
When scraping compensation data, keep these principles in mind:
- Respect robots.txt: Always check and honor the site's robots.txt directives
- Rate limiting: Don't overwhelm the server — add delays between requests
- Data privacy: Salary data should be aggregated, never tied to individuals
- Terms of service: Review and respect the platform's ToS
- Caching: Store results locally to avoid redundant requests
- Attribution: If publishing analysis, credit Levels.fyi as the data source
Conclusion
Levels.fyi is one of the richest sources of tech compensation data available. By combining headless browser techniques with cloud-based scraping infrastructure like Apify, you can extract, structure, and analyze salary data at scale.
Whether you're building a compensation benchmarking tool, evaluating a job offer, or conducting market research, the techniques covered in this guide give you the foundation to work with Levels.fyi data programmatically.
Start small with a single company, validate your extraction pipeline, and then scale up using Apify's cloud infrastructure to cover the entire tech industry's compensation landscape.
Top comments (0)