정상록

Posted on Apr 28

TradingAgents v0.2.4: A Multi-Agent LLM Framework That Simulates an Entire Trading Firm

#llm #ai #agents #machinelearning

TL;DR

UCLA Tauric Research released TradingAgents v0.2.4 (2026-04-25) — a LangGraph-based multi-agent LLM framework that mimics a real trading firm with 5 layers and ~12 agents. The new release adds Pydantic-typed structured outputs, LangGraph checkpoint resumption, a persistent decision-memory file, 5-tier rating, and 10 LLM provider integrations. Backtest on AAPL/GOOGL/AMZN shows 23-27% cumulative return.

⚠️ Disclaimer: backtest only. Not financial advice. Paper trade before any real capital.

Why this is interesting beyond "another trading bot"

Most LLM trading bots are a single model with a giant prompt. They suffer hard from confirmation bias — once they form an initial thesis, they cherry-pick evidence to support it.

TradingAgents counters this structurally with 5 layers of explicit role-based agents that argue with each other:

[Analyst Team x4] -> [Bull vs Bear debate]
        |
        v
   [Trader (3-tier)]
        |
        v
[Risk Mgmt: Aggressive vs Conservative vs Neutral]
        |
        v
   [Portfolio Manager (5-tier)] -> Buy / Overweight / Hold / Underweight / Sell

The whole pipeline runs on LangGraph state graph with explicit handoffs. You can replace any node, log any state, and resume from any checkpoint.

v0.2.4 highlights for developers

1. Structured output decision agents

The Research Manager, Trader, and Portfolio Manager now use llm.with_structured_output(Schema) with Pydantic schemas. This works across:

OpenAI: json_schema
Google Gemini: response_schema
Anthropic Claude: tool-use
Other models: function-calling fallback

No more brittle text parsing for decision values.

2. LangGraph checkpoint resumption

Pass --checkpoint to enable per-node state persistence:

~/.tradingagents/cache/checkpoints/<TICKER>.db

If your run crashes after the Bull/Bear debate but before Risk Management, you resume from there instead of paying for the full pipeline again. Big API cost savings.

3. Persistent decision memory

~/.tradingagents/memory/trading_memory.md

Every run appends a decision record. On the next run for the same ticker, the framework auto-injects:

Previous decision
Actual realized return (raw + alpha vs SPY)
A one-paragraph retrospective

into the Portfolio Manager prompt. This is automated trading-journal-as-context — the framework literally learns from its own past mistakes.

4. 5-tier rating system

The Portfolio Manager now outputs:

Rating	Meaning
Buy	strong buy
Overweight	increase position
Hold	maintain
Underweight	reduce position
Sell	exit

The Trader still uses 3-tier (Buy/Hold/Sell). Only the final Portfolio Manager gets the finer granularity.

5. 10 LLM providers

config["llm_provider"] = "anthropic"  # or:
# openai, google, anthropic, xai, deepseek,
# qwen, glm, openrouter, ollama, azure

Local Ollama is the cost-killer if you don't need top-tier reasoning.

Quick start

from tradingagents.graph.trading_graph import TradingAgentsGraph
from tradingagents.default_config import DEFAULT_CONFIG

config = DEFAULT_CONFIG.copy()
config["llm_provider"] = "anthropic"
config["deep_think_llm"] = "claude-opus-4-7"
config["quick_think_llm"] = "claude-haiku-4-5"
config["max_debate_rounds"] = 2
config["checkpoint_enabled"] = True

ta = TradingAgentsGraph(debug=True, config=config)
_, decision = ta.propagate("NVDA", "2026-01-15")
print(decision)

The split between deep_think_llm and quick_think_llm is the cost-optimization sweet spot:

deep_think_llm — used for Bull/Bear debate, Risk debate, Portfolio Manager (heavy reasoning)
quick_think_llm — used for Analyst data summarization (lightweight)

For Claude users, Opus 4.7 + Haiku 4.5 is a clean combo.

CLI mode

tradingagents

Interactive prompt asks for ticker, date, LLM provider, and research depth. Great for quick experiments.

Backtest performance (paper figures)

On AAPL / GOOGL / AMZN:

Metric	TradingAgents	Baseline
Cumulative Return	23.21~26.62%	lower
Annualized Return	up to 30.5%	lower
Sharpe Ratio	improved	baseline
Max Drawdown	improved	baseline

Source: arXiv 2412.20138 v7 (Yijia Xiao et al.).

What's NOT in the paper

The honest limitations:

Slippage is not modeled
Taxes are not deducted
Market impact for non-trivial positions is ignored
Live data latency is assumed-zero in backtest
All numbers are AAPL/GOOGL/AMZN only — large-cap US tech, the easiest regime

Translation: treat it as a research framework. Don't put real capital based on the paper figures alone.

Generalizing the pattern

The interesting part isn't trading — it's the multi-agent debate pattern itself. The same 5-layer structure could be applied to:

Content production (writer + editor + SEO + reviewer + final approver)
Marketing campaigns (strategy + copy + design + measurement)
Hiring decisions (technical + culture-fit + reference + final)
Pricing decisions (cost + market + competitive + customer)

Anywhere a single human or single LLM tends to confirm-bias their initial take, structured debate between 2+ adversarial agents helps.

Final thoughts

TradingAgents is not a "make money with AI" toolkit. It's a research framework that demonstrates the multi-agent paradigm works in a high-stakes domain. The v0.2.4 additions (structured outputs, checkpoints, persistent memory) make it actually usable for serious experimentation — not just paper-friendly demos.

Worth cloning, reading the LangGraph code, and stealing patterns for your own multi-agent systems.

Links

⚠️ Disclaimer: This article analyzes the TradingAgents framework from a technical perspective. It is not investment advice. Backtest numbers are historical and do not guarantee future returns. Always paper trade extensively before deploying real capital.

DEV Community