Using LLMs for Financial Analysis and Forecasting

#aiinfrastructure #oxlo #ai

Financial analysis demands more than surface-level summarization. Analysts routinely synthesize hundreds of pages of SEC filings, earnings call transcripts, and macroeconomic reports to model risk and project revenue. Large language models can automate this extraction, but production pipelines require careful handling of long context windows, structured output, and iterative reasoning. This article walks through a practical architecture for LLM-driven financial forecasting, and explains where Oxlo.ai fits as a cost-effective, developer-first inference layer.

The Long-Context Challenge in Financial NLP

Financial documents are inherently verbose. A single 10-K filing can exceed 50,000 tokens, and earnings call transcripts often add tens of thousands more. Under token-based billing, every input token increases cost, which makes it expensive to reason over complete documents in a single pass. Oxlo.ai uses request-based pricing, charging one flat cost per API call regardless of prompt length. For workflows that ingest entire filings into context, this can be significantly cheaper than token-based alternatives. The platform also hosts models with extended context windows, including DeepSeek V4 Flash with 1M context and Kimi K2.6 with 131K context, making it feasible to analyze full documents without aggressive chunking strategies.

A Practical Pipeline for Analysis and Forecasting

A production pipeline typically has three stages: ingestion, structured extraction, and forecasting reasoning. Because Oxlo.ai is fully OpenAI SDK compatible, you can implement this with the standard Python client by simply changing the base URL.

Stage 1: Structured extraction with JSON mode

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.getenv("OXLO_API_KEY")
)

# Load a long 10-K filing
with open("aapl_10k.txt") as f:
    filing_text = f.read()

extraction_prompt = f"""
Extract the following from the financial filing below:
- fiscal_year_revenue
- total_long_term_debt
- risk_factors_summary

Respond with valid JSON only.
Filing:
{filing_text}
"""

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": extraction_prompt}],
    response_format={"type": "json_object"}
)

metrics = response.choices[0].message.content
print(metrics)

Stage 2: Forecasting with chain-of-thought reasoning

forecast_prompt = f"""
Given these extracted metrics: {metrics},
project next-year revenue growth and identify three macroeconomic risks.
Show your reasoning before the final projection.
"""

response = client.chat.completions.create(
    model="deepseek-r1-671b",
    messages=[{"role": "user", "content": forecast_prompt}]
)

print(response.choices[0].message.content)

Using JSON mode in the first stage guarantees parsable output, while a reasoning model in the second stage handles the multi-step logic required for credible forecasts.

Model Selection for Financial Workloads

Oxlo.ai provides more than 45 models across seven categories. For financial analysis, the following are particularly relevant:

DeepSeek R1 671B MoE: Best for deep reasoning over complex numerical relationships and multi-step forecasting.
Kimi K2.6: Offers advanced reasoning, agentic coding, and vision capabilities with a 131K context window. Useful for interpreting charts in earnings slide decks.
Qwen 3 32B: Strong multilingual performance for non-English filings and agent workflows.
Llama 3.3 70B: A reliable general-purpose flagship for consistent structured extraction.
GLM 5 (744B MoE): Designed for long-horizon agentic tasks that chain multiple tool calls across disparate data sources.

All of these models support function calling, JSON mode, and streaming responses, so you can build agentic loops that query internal databases or calculators without leaving the OpenAI SDK pattern.

Building Agentic Forecasting Loops

Financial forecasting rarely succeeds in a single prompt. Production agents often retrieve historical time-series data, calculate derived ratios, and refine projections through multiple turns. Because Oxlo.ai charges per request rather than per token, iterative agent loops that repeatedly send long context do not trigger runaway costs. You can include the full filing text on every turn without the bill scaling linearly with token volume.

The example below shows a function-calling setup that lets the model request external data during a forecast:

tools = [{
    "type": "function",
    "function": {
        "name": "get_historical_eps",
        "description": "Retrieve diluted EPS for a ticker over N years",
        "parameters": {
            "type": "object",
            "properties": {
                "ticker": {"type": "string"},
                "years": {"type": "integer"}
            },
            "required": ["ticker", "years"]
        }
    }
}]

response = client.chat.completions.create(
    model="qwen-3-32b",
    messages=[
        {"role": "system", "content": "You are a financial forecasting assistant."},
        {"role": "user", "content": "Forecast AAPL EPS for 2026 using historical data."}
    ],
    tools=tools
)

Since there are no cold starts on