TL;DR
UCLA Tauric Research released TradingAgents v0.2.4 (2026-04-25) — a LangGraph-based multi-agent LLM framework that mimics a real trading firm with 5 layers and ~12 agents. The new release adds Pydantic-typed structured outputs, LangGraph checkpoint resumption, a persistent decision-memory file, 5-tier rating, and 10 LLM provider integrations. Backtest on AAPL/GOOGL/AMZN shows 23-27% cumulative return.
⚠️ Disclaimer: backtest only. Not financial advice. Paper trade before any real capital.
Why this is interesting beyond "another trading bot"
Most LLM trading bots are a single model with a giant prompt. They suffer hard from confirmation bias — once they form an initial thesis, they cherry-pick evidence to support it.
TradingAgents counters this structurally with 5 layers of explicit role-based agents that argue with each other:
[Analyst Team x4] -> [Bull vs Bear debate]
|
v
[Trader (3-tier)]
|
v
[Risk Mgmt: Aggressive vs Conservative vs Neutral]
|
v
[Portfolio Manager (5-tier)] -> Buy / Overweight / Hold / Underweight / Sell
The whole pipeline runs on LangGraph state graph with explicit handoffs. You can replace any node, log any state, and resume from any checkpoint.
v0.2.4 highlights for developers
1. Structured output decision agents
The Research Manager, Trader, and Portfolio Manager now use llm.with_structured_output(Schema) with Pydantic schemas. This works across:
- OpenAI:
json_schema - Google Gemini:
response_schema - Anthropic Claude:
tool-use - Other models: function-calling fallback
No more brittle text parsing for decision values.
2. LangGraph checkpoint resumption
Pass --checkpoint to enable per-node state persistence:
~/.tradingagents/cache/checkpoints/<TICKER>.db
If your run crashes after the Bull/Bear debate but before Risk Management, you resume from there instead of paying for the full pipeline again. Big API cost savings.
3. Persistent decision memory
~/.tradingagents/memory/trading_memory.md
Every run appends a decision record. On the next run for the same ticker, the framework auto-injects:
- Previous decision
- Actual realized return (raw + alpha vs SPY)
- A one-paragraph retrospective
into the Portfolio Manager prompt. This is automated trading-journal-as-context — the framework literally learns from its own past mistakes.
4. 5-tier rating system
The Portfolio Manager now outputs:
| Rating | Meaning |
|---|---|
| Buy | strong buy |
| Overweight | increase position |
| Hold | maintain |
| Underweight | reduce position |
| Sell | exit |
The Trader still uses 3-tier (Buy/Hold/Sell). Only the final Portfolio Manager gets the finer granularity.
5. 10 LLM providers
config["llm_provider"] = "anthropic" # or:
# openai, google, anthropic, xai, deepseek,
# qwen, glm, openrouter, ollama, azure
Local Ollama is the cost-killer if you don't need top-tier reasoning.
Quick start
from tradingagents.graph.trading_graph import TradingAgentsGraph
from tradingagents.default_config import DEFAULT_CONFIG
config = DEFAULT_CONFIG.copy()
config["llm_provider"] = "anthropic"
config["deep_think_llm"] = "claude-opus-4-7"
config["quick_think_llm"] = "claude-haiku-4-5"
config["max_debate_rounds"] = 2
config["checkpoint_enabled"] = True
ta = TradingAgentsGraph(debug=True, config=config)
_, decision = ta.propagate("NVDA", "2026-01-15")
print(decision)
The split between deep_think_llm and quick_think_llm is the cost-optimization sweet spot:
- deep_think_llm — used for Bull/Bear debate, Risk debate, Portfolio Manager (heavy reasoning)
- quick_think_llm — used for Analyst data summarization (lightweight)
For Claude users, Opus 4.7 + Haiku 4.5 is a clean combo.
CLI mode
tradingagents
Interactive prompt asks for ticker, date, LLM provider, and research depth. Great for quick experiments.
Backtest performance (paper figures)
On AAPL / GOOGL / AMZN:
| Metric | TradingAgents | Baseline |
|---|---|---|
| Cumulative Return | 23.21~26.62% | lower |
| Annualized Return | up to 30.5% | lower |
| Sharpe Ratio | improved | baseline |
| Max Drawdown | improved | baseline |
Source: arXiv 2412.20138 v7 (Yijia Xiao et al.).
What's NOT in the paper
The honest limitations:
- Slippage is not modeled
- Taxes are not deducted
- Market impact for non-trivial positions is ignored
- Live data latency is assumed-zero in backtest
- All numbers are AAPL/GOOGL/AMZN only — large-cap US tech, the easiest regime
Translation: treat it as a research framework. Don't put real capital based on the paper figures alone.
Generalizing the pattern
The interesting part isn't trading — it's the multi-agent debate pattern itself. The same 5-layer structure could be applied to:
- Content production (writer + editor + SEO + reviewer + final approver)
- Marketing campaigns (strategy + copy + design + measurement)
- Hiring decisions (technical + culture-fit + reference + final)
- Pricing decisions (cost + market + competitive + customer)
Anywhere a single human or single LLM tends to confirm-bias their initial take, structured debate between 2+ adversarial agents helps.
Final thoughts
TradingAgents is not a "make money with AI" toolkit. It's a research framework that demonstrates the multi-agent paradigm works in a high-stakes domain. The v0.2.4 additions (structured outputs, checkpoints, persistent memory) make it actually usable for serious experimentation — not just paper-friendly demos.
Worth cloning, reading the LangGraph code, and stealing patterns for your own multi-agent systems.
Links
⚠️ Disclaimer: This article analyzes the TradingAgents framework from a technical perspective. It is not investment advice. Backtest numbers are historical and do not guarantee future returns. Always paper trade extensively before deploying real capital.
Top comments (0)