DEV Community

agenthustler
agenthustler

Posted on

How to Build an Earnings Call Transcript Analyzer

How to Build an Earnings Call Transcript Analyzer

Earnings calls contain signals that move stock prices — hedging language, forward guidance changes, and sentiment shifts. Most investors read transcripts manually. We'll build a Python tool that scrapes transcripts and extracts actionable signals automatically.

The Value of Automated Transcript Analysis

When a CEO shifts from "we expect strong growth" to "we anticipate headwinds," that language change predicts stock movement. NLP can detect these shifts across hundreds of companies simultaneously — something no human analyst can do.

Scraping Earnings Transcripts

Several sites publish free earnings call transcripts. We'll build a scraper that collects them systematically using ScraperAPI for reliable proxy rotation:

import requests
from bs4 import BeautifulSoup
import re

SCRAPER_API_KEY = "YOUR_KEY"

def scrape_transcript(url):
    response = requests.get(
        "http://api.scraperapi.com",
        params={
            "api_key": SCRAPER_API_KEY,
            "url": url,
            "render": "true"
        },
        timeout=60
    )
    soup = BeautifulSoup(response.text, "html.parser")
    segments = []
    current_speaker = None
    current_text = []
    current_role = ""

    for element in soup.find_all(["p", "strong", "b"]):
        text = element.get_text(strip=True)
        speaker_match = re.match(r"^([A-Z][a-z]+ [A-Z][a-z]+)\s*[-\u2014]\s*(.+)", text)
        if speaker_match:
            if current_speaker:
                segments.append({
                    "speaker": current_speaker,
                    "role": current_role,
                    "text": " ".join(current_text)
                })
            current_speaker = speaker_match.group(1)
            current_role = speaker_match.group(2)
            current_text = []
        elif current_speaker:
            current_text.append(text)
    return segments
Enter fullscreen mode Exit fullscreen mode

Sentiment Analysis Engine

We analyze sentiment at the sentence level to catch nuanced shifts:

from collections import Counter
import re

POSITIVE_TERMS = {
    "growth", "exceeded", "strong", "record", "confident",
    "momentum", "opportunity", "upside", "robust", "accelerating",
    "outperformed", "beat", "raised", "guidance", "optimistic"
}

NEGATIVE_TERMS = {
    "headwinds", "challenging", "decline", "pressure", "uncertainty",
    "cautious", "softness", "deceleration", "missed", "below",
    "restructuring", "impairment", "risk", "concern", "weakness"
}

HEDGING_TERMS = {
    "may", "might", "could", "potentially", "possibly",
    "approximately", "somewhat", "relatively", "generally"
}

def analyze_segment(segment):
    words = set(re.findall(r'\b\w+\b', segment["text"].lower()))
    pos_count = len(words & POSITIVE_TERMS)
    neg_count = len(words & NEGATIVE_TERMS)
    hedge_count = len(words & HEDGING_TERMS)
    total = pos_count + neg_count or 1
    sentiment_score = (pos_count - neg_count) / total

    return {
        "speaker": segment["speaker"],
        "role": segment["role"],
        "sentiment": sentiment_score,
        "hedging_ratio": hedge_count / max(len(words), 1),
        "positive_terms": list(words & POSITIVE_TERMS),
        "negative_terms": list(words & NEGATIVE_TERMS),
        "word_count": len(words)
    }
Enter fullscreen mode Exit fullscreen mode

Forward Guidance Detection

Extract and compare forward-looking statements:

def extract_guidance(segments):
    guidance_patterns = [
        r"(?:we expect|we anticipate|we project|we forecast|guidance).{10,200}",
        r"(?:next quarter|full year|fiscal year).{10,200}",
        r"(?:revenue.{0,30}(?:between|range|expect)|eps.{0,30}(?:between|range|expect)).{10,100}",
    ]
    guidance_statements = []
    for segment in segments:
        if segment.get("role", "").lower() in ["ceo", "cfo", "chief executive", "chief financial"]:
            for pattern in guidance_patterns:
                matches = re.findall(pattern, segment["text"].lower())
                for match in matches:
                    guidance_statements.append({
                        "speaker": segment["speaker"],
                        "role": segment["role"],
                        "statement": match.strip(),
                        "type": "forward_guidance"
                    })
    return guidance_statements
Enter fullscreen mode Exit fullscreen mode

Quarter-over-Quarter Comparison

The real value emerges when you compare transcripts across quarters:

def compare_quarters(current_analysis, previous_analysis):
    changes = {}
    for speaker in set(a["speaker"] for a in current_analysis):
        curr = [a for a in current_analysis if a["speaker"] == speaker]
        prev = [a for a in previous_analysis if a["speaker"] == speaker]
        if curr and prev:
            sentiment_shift = curr[0]["sentiment"] - prev[0]["sentiment"]
            hedging_shift = curr[0]["hedging_ratio"] - prev[0]["hedging_ratio"]
            changes[speaker] = {
                "sentiment_change": round(sentiment_shift, 3),
                "hedging_change": round(hedging_shift, 3),
                "signal": (
                    "BEARISH" if sentiment_shift < -0.2 or hedging_shift > 0.05
                    else "BULLISH" if sentiment_shift > 0.2
                    else "NEUTRAL"
                )
            }
    return changes
Enter fullscreen mode Exit fullscreen mode

Scaling the Analysis

To analyze hundreds of earnings calls per quarter, you need robust proxy infrastructure. ThorData provides residential proxies for scraping financial sites, and ScrapeOps monitors scraper performance across sources.

Putting It All Together

def full_analysis_pipeline(transcript_url):
    segments = scrape_transcript(transcript_url)
    analysis = [analyze_segment(s) for s in segments]
    guidance = extract_guidance(segments)
    ceo_sentiment = [a for a in analysis if "ceo" in a["role"].lower()]
    overall = sum(a["sentiment"] for a in analysis) / len(analysis) if analysis else 0
    return {
        "overall_sentiment": round(overall, 3),
        "ceo_sentiment": ceo_sentiment,
        "guidance_statements": guidance,
        "segment_count": len(segments)
    }
Enter fullscreen mode Exit fullscreen mode

Key Insights

  • Hedging increases in the Q&A section often predict earnings misses next quarter
  • CEO vs CFO sentiment divergence can signal internal disagreements about outlook
  • Word count changes (shorter answers) sometimes indicate discomfort with a topic

Build this analyzer, run it on historical transcripts, and backtest against stock performance. The patterns are real and exploitable.

Top comments (0)