How to Build an Earnings Call Transcript Analyzer
Earnings calls contain signals that move stock prices — hedging language, forward guidance changes, and sentiment shifts. Most investors read transcripts manually. We'll build a Python tool that scrapes transcripts and extracts actionable signals automatically.
The Value of Automated Transcript Analysis
When a CEO shifts from "we expect strong growth" to "we anticipate headwinds," that language change predicts stock movement. NLP can detect these shifts across hundreds of companies simultaneously — something no human analyst can do.
Scraping Earnings Transcripts
Several sites publish free earnings call transcripts. We'll build a scraper that collects them systematically using ScraperAPI for reliable proxy rotation:
import requests
from bs4 import BeautifulSoup
import re
SCRAPER_API_KEY = "YOUR_KEY"
def scrape_transcript(url):
response = requests.get(
"http://api.scraperapi.com",
params={
"api_key": SCRAPER_API_KEY,
"url": url,
"render": "true"
},
timeout=60
)
soup = BeautifulSoup(response.text, "html.parser")
segments = []
current_speaker = None
current_text = []
current_role = ""
for element in soup.find_all(["p", "strong", "b"]):
text = element.get_text(strip=True)
speaker_match = re.match(r"^([A-Z][a-z]+ [A-Z][a-z]+)\s*[-\u2014]\s*(.+)", text)
if speaker_match:
if current_speaker:
segments.append({
"speaker": current_speaker,
"role": current_role,
"text": " ".join(current_text)
})
current_speaker = speaker_match.group(1)
current_role = speaker_match.group(2)
current_text = []
elif current_speaker:
current_text.append(text)
return segments
Sentiment Analysis Engine
We analyze sentiment at the sentence level to catch nuanced shifts:
from collections import Counter
import re
POSITIVE_TERMS = {
"growth", "exceeded", "strong", "record", "confident",
"momentum", "opportunity", "upside", "robust", "accelerating",
"outperformed", "beat", "raised", "guidance", "optimistic"
}
NEGATIVE_TERMS = {
"headwinds", "challenging", "decline", "pressure", "uncertainty",
"cautious", "softness", "deceleration", "missed", "below",
"restructuring", "impairment", "risk", "concern", "weakness"
}
HEDGING_TERMS = {
"may", "might", "could", "potentially", "possibly",
"approximately", "somewhat", "relatively", "generally"
}
def analyze_segment(segment):
words = set(re.findall(r'\b\w+\b', segment["text"].lower()))
pos_count = len(words & POSITIVE_TERMS)
neg_count = len(words & NEGATIVE_TERMS)
hedge_count = len(words & HEDGING_TERMS)
total = pos_count + neg_count or 1
sentiment_score = (pos_count - neg_count) / total
return {
"speaker": segment["speaker"],
"role": segment["role"],
"sentiment": sentiment_score,
"hedging_ratio": hedge_count / max(len(words), 1),
"positive_terms": list(words & POSITIVE_TERMS),
"negative_terms": list(words & NEGATIVE_TERMS),
"word_count": len(words)
}
Forward Guidance Detection
Extract and compare forward-looking statements:
def extract_guidance(segments):
guidance_patterns = [
r"(?:we expect|we anticipate|we project|we forecast|guidance).{10,200}",
r"(?:next quarter|full year|fiscal year).{10,200}",
r"(?:revenue.{0,30}(?:between|range|expect)|eps.{0,30}(?:between|range|expect)).{10,100}",
]
guidance_statements = []
for segment in segments:
if segment.get("role", "").lower() in ["ceo", "cfo", "chief executive", "chief financial"]:
for pattern in guidance_patterns:
matches = re.findall(pattern, segment["text"].lower())
for match in matches:
guidance_statements.append({
"speaker": segment["speaker"],
"role": segment["role"],
"statement": match.strip(),
"type": "forward_guidance"
})
return guidance_statements
Quarter-over-Quarter Comparison
The real value emerges when you compare transcripts across quarters:
def compare_quarters(current_analysis, previous_analysis):
changes = {}
for speaker in set(a["speaker"] for a in current_analysis):
curr = [a for a in current_analysis if a["speaker"] == speaker]
prev = [a for a in previous_analysis if a["speaker"] == speaker]
if curr and prev:
sentiment_shift = curr[0]["sentiment"] - prev[0]["sentiment"]
hedging_shift = curr[0]["hedging_ratio"] - prev[0]["hedging_ratio"]
changes[speaker] = {
"sentiment_change": round(sentiment_shift, 3),
"hedging_change": round(hedging_shift, 3),
"signal": (
"BEARISH" if sentiment_shift < -0.2 or hedging_shift > 0.05
else "BULLISH" if sentiment_shift > 0.2
else "NEUTRAL"
)
}
return changes
Scaling the Analysis
To analyze hundreds of earnings calls per quarter, you need robust proxy infrastructure. ThorData provides residential proxies for scraping financial sites, and ScrapeOps monitors scraper performance across sources.
Putting It All Together
def full_analysis_pipeline(transcript_url):
segments = scrape_transcript(transcript_url)
analysis = [analyze_segment(s) for s in segments]
guidance = extract_guidance(segments)
ceo_sentiment = [a for a in analysis if "ceo" in a["role"].lower()]
overall = sum(a["sentiment"] for a in analysis) / len(analysis) if analysis else 0
return {
"overall_sentiment": round(overall, 3),
"ceo_sentiment": ceo_sentiment,
"guidance_statements": guidance,
"segment_count": len(segments)
}
Key Insights
- Hedging increases in the Q&A section often predict earnings misses next quarter
- CEO vs CFO sentiment divergence can signal internal disagreements about outlook
- Word count changes (shorter answers) sometimes indicate discomfort with a topic
Build this analyzer, run it on historical transcripts, and backtest against stock performance. The patterns are real and exploitable.
Top comments (0)