Yahoo Finance is one of the most popular financial data platforms on the internet, offering a wealth of information including real-time stock quotes, historical price data, financial statements, earnings reports, analyst ratings, and market news. For data analysts, quantitative researchers, and fintech developers, being able to extract this data programmatically is invaluable.
In this comprehensive guide, we'll explore Yahoo Finance's structure, demonstrate how to scrape stock prices, financial statements, and news feeds using both Python and Node.js, and show how to scale your extraction using Apify's cloud platform.
Understanding Yahoo Finance's Data Structure
Yahoo Finance organizes financial data around ticker symbols. Each company page (finance.yahoo.com/quote/{TICKER}) serves as a hub linking to multiple data views:
Quote Page
The main quote page shows the current price, daily change, volume, market cap, P/E ratio, dividend yield, and 52-week range. It also includes a mini chart and recent news.
Historical Data
Available at /quote/{TICKER}/history/, this section provides daily, weekly, or monthly OHLCV (Open, High, Low, Close, Volume) data going back decades for most stocks.
Financial Statements
Under the Financials tab (/quote/{TICKER}/financials/), you'll find:
- Income Statement: Revenue, operating income, net income, EPS
- Balance Sheet: Assets, liabilities, equity
- Cash Flow Statement: Operating, investing, and financing cash flows
Each can be viewed annually or quarterly.
Analysis & Earnings
The Analysis page shows analyst recommendations, price targets, earnings estimates, and revenue estimates. The earnings calendar shows upcoming and past earnings dates with EPS estimates vs actuals.
News Feed
Yahoo Finance aggregates financial news from multiple sources, with both general market news and stock-specific news on each quote page.
Method 1: Using yfinance (Python Library)
The yfinance library is the easiest way to get started with Yahoo Finance data:
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
class YahooFinanceExtractor:
def __init__(self):
self.cache = {}
def get_stock_info(self, ticker):
"""Get comprehensive stock information."""
stock = yf.Ticker(ticker)
info = stock.info
return {
"symbol": ticker,
"name": info.get("longName"),
"sector": info.get("sector"),
"industry": info.get("industry"),
"market_cap": info.get("marketCap"),
"current_price": info.get("currentPrice"),
"pe_ratio": info.get("trailingPE"),
"forward_pe": info.get("forwardPE"),
"dividend_yield": info.get("dividendYield"),
"fifty_two_week_high": info.get("fiftyTwoWeekHigh"),
"fifty_two_week_low": info.get("fiftyTwoWeekLow"),
"avg_volume": info.get("averageVolume"),
"beta": info.get("beta"),
"earnings_date": info.get("earningsTimestamp"),
"target_mean_price": info.get("targetMeanPrice"),
"recommendation": info.get("recommendationKey"),
}
def get_historical_prices(self, ticker, period="1y", interval="1d"):
"""Get historical OHLCV data."""
stock = yf.Ticker(ticker)
hist = stock.history(period=period, interval=interval)
records = []
for date, row in hist.iterrows():
records.append({
"date": date.strftime("%Y-%m-%d"),
"open": round(row["Open"], 2),
"high": round(row["High"], 2),
"low": round(row["Low"], 2),
"close": round(row["Close"], 2),
"volume": int(row["Volume"]),
})
return records
def get_financial_statements(self, ticker):
"""Get income statement, balance sheet, and cash flow."""
stock = yf.Ticker(ticker)
return {
"income_statement": stock.financials.to_dict() if stock.financials is not None else {},
"balance_sheet": stock.balance_sheet.to_dict() if stock.balance_sheet is not None else {},
"cash_flow": stock.cashflow.to_dict() if stock.cashflow is not None else {},
}
def get_earnings_data(self, ticker):
"""Get earnings history and estimates."""
stock = yf.Ticker(ticker)
earnings_hist = stock.earnings_history
if earnings_hist is not None and not earnings_hist.empty:
earnings_list = earnings_hist.to_dict("records")
else:
earnings_list = []
return {
"earnings_history": earnings_list,
"earnings_dates": stock.earnings_dates.to_dict("records") if stock.earnings_dates is not None and not stock.earnings_dates.empty else [],
}
def get_news(self, ticker):
"""Get recent news for a stock."""
stock = yf.Ticker(ticker)
news = stock.news
articles = []
for item in news:
articles.append({
"title": item.get("title"),
"publisher": item.get("publisher"),
"link": item.get("link"),
"published": item.get("providerPublishTime"),
"type": item.get("type"),
})
return articles
# Usage
extractor = YahooFinanceExtractor()
# Get Apple stock info
info = extractor.get_stock_info("AAPL")
print(f"Company: {info['name']}")
print(f"Price: ${info['current_price']}")
print(f"P/E Ratio: {info['pe_ratio']}")
print(f"Market Cap: ${info['market_cap']:,}")
# Get historical prices
prices = extractor.get_historical_prices("AAPL", period="6mo")
print(f"\nHistorical data points: {len(prices)}")
print(f"Latest: {prices[-1]['date']} - Close: ${prices[-1]['close']}")
# Get financials
financials = extractor.get_financial_statements("AAPL")
print(f"\nFinancial statements retrieved successfully")
Method 2: Direct Web Scraping with Python
For data that yfinance doesn't expose, or when you need more control, you can scrape Yahoo Finance directly:
import requests
from bs4 import BeautifulSoup
import json
import re
import time
class YahooFinanceScraper:
BASE_URL = "https://finance.yahoo.com"
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
})
def scrape_quote_page(self, ticker):
"""Scrape the main quote page for real-time data."""
url = f"{self.BASE_URL}/quote/{ticker}/"
response = self.session.get(url)
soup = BeautifulSoup(response.text, "html.parser")
# Extract price data from the page
price_el = soup.select_one('[data-testid="qsp-price"]')
change_el = soup.select_one('[data-testid="qsp-price-change"]')
# Extract key statistics
stats = {}
stat_rows = soup.select('[data-testid="quote-statistics"] li')
for row in stat_rows:
label = row.select_one("span:first-child")
value = row.select_one("span:last-child")
if label and value:
stats[label.text.strip()] = value.text.strip()
return {
"ticker": ticker,
"price": price_el.text.strip() if price_el else None,
"change": change_el.text.strip() if change_el else None,
"statistics": stats,
}
def scrape_historical_data(self, ticker, period1=None, period2=None):
"""Scrape historical price data using Yahoo's download API."""
import time as t
if period2 is None:
period2 = int(t.time())
if period1 is None:
period1 = period2 - (365 * 24 * 60 * 60) # 1 year ago
url = (f"https://query1.finance.yahoo.com/v7/finance/download/{ticker}"
f"?period1={period1}&period2={period2}"
f"&interval=1d&events=history")
response = self.session.get(url)
if response.status_code == 200:
lines = response.text.strip().split("\n")
headers = lines[0].split(",")
data = []
for line in lines[1:]:
values = line.split(",")
row = dict(zip(headers, values))
data.append(row)
return data
return []
def scrape_financials(self, ticker, statement="income"):
"""Scrape financial statements from the financials page."""
statement_map = {
"income": "financials",
"balance": "balance-sheet",
"cashflow": "cash-flow",
}
slug = statement_map.get(statement, "financials")
url = f"{self.BASE_URL}/quote/{ticker}/{slug}/"
response = self.session.get(url)
soup = BeautifulSoup(response.text, "html.parser")
# Yahoo Finance renders financials via JavaScript,
# so we look for the embedded JSON data
scripts = soup.find_all("script")
for script in scripts:
if script.string and "root.App.main" in script.string:
json_str = re.search(
r"root\.App\.main\s*=\s*({.*?});",
script.string
)
if json_str:
data = json.loads(json_str.group(1))
return self._extract_financial_data(data, ticker)
return {}
def scrape_news(self, ticker):
"""Scrape news articles for a specific stock."""
url = f"{self.BASE_URL}/quote/{ticker}/news/"
response = self.session.get(url)
soup = BeautifulSoup(response.text, "html.parser")
articles = []
news_items = soup.select("section.container li")
for item in news_items:
title_el = item.select_one("h3")
link_el = item.select_one("a")
source_el = item.select_one(".publishing")
if title_el:
articles.append({
"title": title_el.text.strip(),
"url": link_el["href"] if link_el else None,
"source": source_el.text.strip() if source_el else None,
})
return articles
# Usage
scraper = YahooFinanceScraper()
# Scrape Tesla quote
quote = scraper.scrape_quote_page("TSLA")
print(f"TSLA Price: {quote['price']}")
print(f"Change: {quote['change']}")
for key, value in quote['statistics'].items():
print(f" {key}: {value}")
# Get historical data
history = scraper.scrape_historical_data("TSLA")
print(f"\nHistorical records: {len(history)}")
Method 3: Node.js Scraping
For JavaScript developers, here's a Node.js approach:
const axios = require('axios');
const cheerio = require('cheerio');
class YahooFinanceScraper {
constructor() {
this.baseUrl = 'https://finance.yahoo.com';
this.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept-Language': 'en-US,en;q=0.9',
};
}
async getQuote(ticker) {
const url = `${this.baseUrl}/quote/${ticker}/`;
const { data } = await axios.get(url, { headers: this.headers });
const $ = cheerio.load(data);
const price = $('[data-testid="qsp-price"]').text().trim();
const change = $('[data-testid="qsp-price-change"]').text().trim();
const stats = {};
$('[data-testid="quote-statistics"] li').each((_, el) => {
const spans = $(el).find('span');
if (spans.length >= 2) {
const label = spans.first().text().trim();
const value = spans.last().text().trim();
stats[label] = value;
}
});
return { ticker, price, change, stats };
}
async getHistoricalPrices(ticker, range = '1y') {
// Use Yahoo Finance API v8 for historical data
const url = `https://query1.finance.yahoo.com/v8/finance/chart/${ticker}`;
const params = { range, interval: '1d' };
try {
const { data } = await axios.get(url, {
headers: this.headers,
params,
});
const result = data.chart.result[0];
const timestamps = result.timestamp;
const quote = result.indicators.quote[0];
return timestamps.map((ts, i) => ({
date: new Date(ts * 1000).toISOString().split('T')[0],
open: quote.open[i]?.toFixed(2),
high: quote.high[i]?.toFixed(2),
low: quote.low[i]?.toFixed(2),
close: quote.close[i]?.toFixed(2),
volume: quote.volume[i],
}));
} catch (error) {
console.error(`Error fetching historical data: ${error.message}`);
return [];
}
}
async getNews(ticker) {
const url = `${this.baseUrl}/quote/${ticker}/news/`;
const { data } = await axios.get(url, { headers: this.headers });
const $ = cheerio.load(data);
const articles = [];
$('section.container li').each((_, el) => {
const title = $(el).find('h3').text().trim();
const link = $(el).find('a').attr('href');
const source = $(el).find('.publishing').text().trim();
if (title) {
articles.push({ title, url: link, source });
}
});
return articles;
}
async getMultipleQuotes(tickers) {
const quotes = await Promise.all(
tickers.map(async (ticker) => {
try {
const quote = await this.getQuote(ticker);
return quote;
} catch (err) {
return { ticker, error: err.message };
}
})
);
return quotes;
}
}
// Usage
(async () => {
const scraper = new YahooFinanceScraper();
// Get multiple quotes
const tickers = ['AAPL', 'GOOGL', 'MSFT', 'AMZN'];
const quotes = await scraper.getMultipleQuotes(tickers);
quotes.forEach(q => {
if (!q.error) {
console.log(`${q.ticker}: $${q.price} (${q.change})`);
}
});
// Get historical data
const history = await scraper.getHistoricalPrices('AAPL', '6mo');
console.log(`\nHistorical data points: ${history.length}`);
console.log(`Latest: ${history[history.length-1].date} - $${history[history.length-1].close}`);
})();
Extracting Financial Statements in Detail
Financial statements are among the most valuable data on Yahoo Finance. Here's a specialized approach:
import yfinance as yf
import pandas as pd
def extract_detailed_financials(ticker):
"""Extract and structure detailed financial data."""
stock = yf.Ticker(ticker)
# Income Statement
income = stock.financials
quarterly_income = stock.quarterly_financials
# Balance Sheet
balance = stock.balance_sheet
quarterly_balance = stock.quarterly_balance_sheet
# Cash Flow
cashflow = stock.cashflow
quarterly_cashflow = stock.quarterly_cashflow
# Key metrics derived from financial data
if income is not None and not income.empty:
latest = income.iloc[:, 0] # Most recent year
revenue = latest.get("Total Revenue", 0)
net_income = latest.get("Net Income", 0)
operating_income = latest.get("Operating Income", 0)
metrics = {
"revenue": revenue,
"net_income": net_income,
"operating_income": operating_income,
"profit_margin": round(net_income / revenue * 100, 2) if revenue else 0,
"operating_margin": round(operating_income / revenue * 100, 2) if revenue else 0,
}
else:
metrics = {}
# Growth rates (year over year)
if income is not None and income.shape[1] >= 2:
current_rev = income.iloc[:, 0].get("Total Revenue", 0)
prev_rev = income.iloc[:, 1].get("Total Revenue", 0)
if prev_rev:
metrics["revenue_growth"] = round(
(current_rev - prev_rev) / prev_rev * 100, 2
)
return {
"ticker": ticker,
"key_metrics": metrics,
"annual_income_statement": income.to_dict() if income is not None else {},
"quarterly_income_statement": quarterly_income.to_dict() if quarterly_income is not None else {},
"annual_balance_sheet": balance.to_dict() if balance is not None else {},
"annual_cash_flow": cashflow.to_dict() if cashflow is not None else {},
}
# Extract and display financials
data = extract_detailed_financials("AAPL")
print(f"Revenue: ${data['key_metrics'].get('revenue', 0):,.0f}")
print(f"Net Income: ${data['key_metrics'].get('net_income', 0):,.0f}")
print(f"Profit Margin: {data['key_metrics'].get('profit_margin', 0)}%")
print(f"Revenue Growth: {data['key_metrics'].get('revenue_growth', 'N/A')}%")
Scaling with Apify
For production-grade Yahoo Finance scraping, Apify provides the infrastructure to handle high volumes reliably. Here's an Apify actor for Yahoo Finance:
const { Actor } = require('apify');
const { CheerioCrawler } = require('crawlee');
Actor.main(async () => {
const input = await Actor.getInput();
const {
tickers = ['AAPL', 'GOOGL', 'MSFT'],
scrapeHistorical = true,
scrapeFinancials = true,
scrapeNews = true,
} = input;
const dataset = await Actor.openDataset('yahoo-finance-data');
const crawler = new CheerioCrawler({
maxConcurrency: 3, // Be gentle with Yahoo Finance
maxRequestRetries: 3,
async requestHandler({ request, $, log }) {
const { ticker, dataType } = request.userData;
if (dataType === 'quote') {
const price = $('[data-testid="qsp-price"]').text().trim();
const change = $('[data-testid="qsp-price-change"]').text().trim();
const stats = {};
$('[data-testid="quote-statistics"] li').each((_, el) => {
const spans = $(el).find('span');
if (spans.length >= 2) {
stats[spans.first().text().trim()] = spans.last().text().trim();
}
});
await dataset.pushData({
type: 'quote',
ticker,
price,
change,
statistics: stats,
scrapedAt: new Date().toISOString(),
});
log.info(`Scraped quote for ${ticker}: $${price}`);
} else if (dataType === 'news') {
const articles = [];
$('section.container li').each((_, el) => {
const title = $(el).find('h3').text().trim();
const link = $(el).find('a').attr('href');
if (title) {
articles.push({ title, url: link });
}
});
await dataset.pushData({
type: 'news',
ticker,
articles,
scrapedAt: new Date().toISOString(),
});
log.info(`Scraped ${articles.length} news articles for ${ticker}`);
}
},
});
// Build request list
const requests = [];
for (const ticker of tickers) {
requests.push({
url: `https://finance.yahoo.com/quote/${ticker}/`,
userData: { ticker, dataType: 'quote' },
});
if (scrapeNews) {
requests.push({
url: `https://finance.yahoo.com/quote/${ticker}/news/`,
userData: { ticker, dataType: 'news' },
});
}
}
await crawler.run(requests);
log.info(`Scraping complete for ${tickers.length} tickers`);
});
Why Use Apify for Yahoo Finance?
Proxy management: Yahoo Finance aggressively blocks scrapers. Apify's proxy pool ensures consistent access.
Scheduling: Set up daily or hourly scraping runs to maintain fresh market data.
Data export: Export to JSON, CSV, or push directly to your database via webhooks.
Monitoring: Get alerts when scraping fails, so you never miss market data.
Scalability: Scrape hundreds of tickers simultaneously without infrastructure headaches.
Building a Stock Screener
Combine all the techniques above to build a powerful stock screener:
import yfinance as yf
import json
def screen_stocks(tickers, criteria):
"""Screen stocks based on financial criteria."""
results = []
for ticker in tickers:
try:
stock = yf.Ticker(ticker)
info = stock.info
# Apply screening criteria
passes = True
stock_data = {"ticker": ticker, "name": info.get("longName")}
for metric, (min_val, max_val) in criteria.items():
value = info.get(metric)
if value is None:
passes = False
break
if min_val is not None and value < min_val:
passes = False
break
if max_val is not None and value > max_val:
passes = False
break
stock_data[metric] = value
if passes:
results.append(stock_data)
except Exception as e:
print(f"Error processing {ticker}: {e}")
return results
# Define screening criteria
criteria = {
"trailingPE": (5, 25), # P/E between 5 and 25
"dividendYield": (0.02, None), # Dividend yield > 2%
"marketCap": (10e9, None), # Market cap > $10B
"beta": (None, 1.5), # Beta < 1.5
}
# Screen S&P 500 stocks (sample)
tickers = ["AAPL", "MSFT", "JNJ", "PG", "KO", "PEP", "XOM", "CVX", "JPM", "BAC"]
matches = screen_stocks(tickers, criteria)
print(f"Stocks matching criteria: {len(matches)}")
for stock in matches:
print(f" {stock['ticker']}: {stock['name']}")
print(f" P/E: {stock.get('trailingPE', 'N/A'):.1f}")
print(f" Div Yield: {stock.get('dividendYield', 0)*100:.1f}%")
Handling Common Challenges
Rate Limiting
Yahoo Finance rate-limits aggressive requests. Solutions include adding delays between requests (2-5 seconds), rotating user agents, using proxy services, and caching responses to avoid redundant requests.
Dynamic Content
Some Yahoo Finance data loads via JavaScript. For these sections, consider using Puppeteer or Playwright to render the page, extracting data from embedded JSON in script tags, or using Yahoo Finance's undocumented API endpoints.
Data Accuracy
Always cross-reference scraped financial data with official SEC filings. Market data may have slight delays. Use multiple data sources for critical financial decisions.
Ethical Considerations
Terms of Service: Review Yahoo Finance's ToS regarding automated data collection.
Rate limiting: Always implement respectful delays. Don't hammer their servers.
Data usage: Financial data may have redistribution restrictions. Check licensing.
Not financial advice: Scraped data should supplement, not replace, professional financial analysis.
Personal data: Avoid scraping or storing user comments or profile data without consent.
Conclusion
Yahoo Finance offers an incredible depth of financial data that, when extracted programmatically, can power everything from personal stock screeners to institutional-grade research platforms. Whether you start with the yfinance Python library for quick prototyping, build custom scrapers for specialized needs, or scale up with Apify's cloud infrastructure, the techniques covered in this guide provide a solid foundation.
Remember to scrape responsibly, respect rate limits, verify your data against multiple sources, and always comply with terms of service. The financial data landscape is rich — with the right tools and approach, you can build powerful data pipelines that keep you ahead of the market.
Happy scraping, and may your portfolios prosper!
Top comments (0)