DEV Community

ivan-digital
ivan-digital

Posted on

Building an NLP Pipeline to Classify 225,000 Central Bank Sentences

The Problem

Central banks communicate through dense, jargon-heavy documents — policy statements, meeting minutes, press conferences. A single Fed statement is 1,500+ words. The ECB publishes minutes in 10,000+ word documents. Multiply that by 26 central banks, each publishing monthly or quarterly, and you have an impossible amount of text to track manually.

I wanted to answer a simple question: which central banks are turning hawkish and which are turning dovish — right now?

The Approach

Instead of summarizing entire documents, I break them into individual sentences and classify each one. Every sentence gets two labels:

Sentiment (what policy direction does it signal?):

  • rate_hike, rate_cut, rate_hold
  • guidance_hawkish, guidance_dovish
  • dissent_hawkish, dissent_dovish
  • liquidity_easing, liquidity_tightening
  • neutral, irrelevant

Topic (what economic area?):

  • mp_inflation, mp_interest_rate, mp_economic_activity
  • mp_labor_market, mp_exchange_rate, mp_credit
  • financial_stability, fiscal_policy, governance

This gives a granular view — not just "the Fed is hawkish" but "the Fed's inflation language is hawkish while its labor market language is turning dovish."

Architecture

The pipeline has four stages:

1. Crawling

Each central bank has a custom async crawler (Python + aiohttp). Some banks publish clean HTML, others only PDFs, a few require Playwright for JavaScript-rendered pages. The crawlers run daily via Airflow.

Sources per bank:

  • Policy statements and decisions
  • Meeting minutes
  • Press conference transcripts
  • Speeches (for some banks)

2. Sentence Splitting

Documents are split into sentences using rule-based splitting tuned for central bank language. This matters because naive splitting breaks on abbreviations like "Fed." or "Q4." or numbered lists common in policy documents.

3. Classification

Each sentence is classified by an LLM with bank-specific prompt rules. The key insight: central bank language is domain-specific enough that generic sentiment analysis fails badly.

Examples that trip up generic classifiers:

Sentence Naive Classification Correct
"Future monetary policy decisions will be conditional on the inflation outlook" guidance_hawkish neutral (boilerplate)
"The member voted against the rate increase" dissent_hawkish dissent_dovish (wanted lower rates)
"Average interest rate on ruble loans rose to 8.5%" rate_hike neutral (market rate description, not policy)

To catch these errors, each sentence is classified twice at different temperatures (0.0 and 0.1). Disagreements are flagged for review.

4. Aggregation

Sentence-level classifications are aggregated into document-level and bank-level metrics:

  • Hawk/dove ratio per document
  • Stance shifts over time
  • Dissent tracking (who dissented and in which direction)

What I Learned

Dissent direction is counterintuitive. If the majority voted to hike rates and one member dissented, that dissent is dovish — the dissenter wanted lower rates. This seems obvious in retrospect, but getting the prompts right took several iterations.

Boilerplate is the enemy. Every central bank repeats the same conditional phrases meeting after meeting: "future decisions will depend on incoming data." These aren't signals — they're filler. The classifier needed explicit examples of common boilerplate to avoid false positives.

Bank-specific rules matter. The PBOC communicates completely differently from the Fed. PBOC statements are short and formulaic. Fed minutes are discursive with extensive debate. The Bank of Russia quarterly reviews describe market conditions that look like policy decisions but aren't. Each required tailored prompt rules.

Current Scale

The Divergences Right Now

Some current policy stances that stand out:

Bank Rate Stance Notable
BOJ 0.75% Cautiously hawkish Normalizing after decades at zero
SNB 0.00% Neutral Back to the floor
TCMB 37% Hawkish Emergency tightening
PBOC 3.00% Dovish Supporting growth
BCB 14.75% Hawkish Among highest G20
Fed 3.75% Mixed Cutting but cautious language

Live Dashboard

The full dashboard is at monetary.live — each bank has its own page with statement history, sentiment breakdowns, and policy metrics.

Also tracking tech trends with a separate pipeline at pulsar.ivan.digital (arXiv papers, GitHub repos, Reddit discussions).

Tech Stack

  • Python async crawlers (aiohttp, Playwright)
  • LLM classification with self-validation
  • SQLite for storage
  • Airflow for orchestration
  • Firebase Hosting for the dashboard
  • structlog for logging

Would love feedback on the methodology. If you work with central bank text data or NLP for finance, I'd be curious to hear what approaches you've tried.

Top comments (0)