DEV Community

정상록
정상록

Posted on

TradingAgents v0.2.4: A Multi-Agent LLM Framework That Simulates an Entire Trading Firm

TL;DR

UCLA Tauric Research released TradingAgents v0.2.4 (2026-04-25) — a LangGraph-based multi-agent LLM framework that mimics a real trading firm with 5 layers and ~12 agents. The new release adds Pydantic-typed structured outputs, LangGraph checkpoint resumption, a persistent decision-memory file, 5-tier rating, and 10 LLM provider integrations. Backtest on AAPL/GOOGL/AMZN shows 23-27% cumulative return.

⚠️ Disclaimer: backtest only. Not financial advice. Paper trade before any real capital.

Why this is interesting beyond "another trading bot"

Most LLM trading bots are a single model with a giant prompt. They suffer hard from confirmation bias — once they form an initial thesis, they cherry-pick evidence to support it.

TradingAgents counters this structurally with 5 layers of explicit role-based agents that argue with each other:

[Analyst Team x4] -> [Bull vs Bear debate]
        |
        v
   [Trader (3-tier)]
        |
        v
[Risk Mgmt: Aggressive vs Conservative vs Neutral]
        |
        v
   [Portfolio Manager (5-tier)] -> Buy / Overweight / Hold / Underweight / Sell
Enter fullscreen mode Exit fullscreen mode

The whole pipeline runs on LangGraph state graph with explicit handoffs. You can replace any node, log any state, and resume from any checkpoint.

v0.2.4 highlights for developers

1. Structured output decision agents

The Research Manager, Trader, and Portfolio Manager now use llm.with_structured_output(Schema) with Pydantic schemas. This works across:

  • OpenAI: json_schema
  • Google Gemini: response_schema
  • Anthropic Claude: tool-use
  • Other models: function-calling fallback

No more brittle text parsing for decision values.

2. LangGraph checkpoint resumption

Pass --checkpoint to enable per-node state persistence:

~/.tradingagents/cache/checkpoints/<TICKER>.db
Enter fullscreen mode Exit fullscreen mode

If your run crashes after the Bull/Bear debate but before Risk Management, you resume from there instead of paying for the full pipeline again. Big API cost savings.

3. Persistent decision memory

~/.tradingagents/memory/trading_memory.md
Enter fullscreen mode Exit fullscreen mode

Every run appends a decision record. On the next run for the same ticker, the framework auto-injects:

  • Previous decision
  • Actual realized return (raw + alpha vs SPY)
  • A one-paragraph retrospective

into the Portfolio Manager prompt. This is automated trading-journal-as-context — the framework literally learns from its own past mistakes.

4. 5-tier rating system

The Portfolio Manager now outputs:

Rating Meaning
Buy strong buy
Overweight increase position
Hold maintain
Underweight reduce position
Sell exit

The Trader still uses 3-tier (Buy/Hold/Sell). Only the final Portfolio Manager gets the finer granularity.

5. 10 LLM providers

config["llm_provider"] = "anthropic"  # or:
# openai, google, anthropic, xai, deepseek,
# qwen, glm, openrouter, ollama, azure
Enter fullscreen mode Exit fullscreen mode

Local Ollama is the cost-killer if you don't need top-tier reasoning.

Quick start

from tradingagents.graph.trading_graph import TradingAgentsGraph
from tradingagents.default_config import DEFAULT_CONFIG

config = DEFAULT_CONFIG.copy()
config["llm_provider"] = "anthropic"
config["deep_think_llm"] = "claude-opus-4-7"
config["quick_think_llm"] = "claude-haiku-4-5"
config["max_debate_rounds"] = 2
config["checkpoint_enabled"] = True

ta = TradingAgentsGraph(debug=True, config=config)
_, decision = ta.propagate("NVDA", "2026-01-15")
print(decision)
Enter fullscreen mode Exit fullscreen mode

The split between deep_think_llm and quick_think_llm is the cost-optimization sweet spot:

  • deep_think_llm — used for Bull/Bear debate, Risk debate, Portfolio Manager (heavy reasoning)
  • quick_think_llm — used for Analyst data summarization (lightweight)

For Claude users, Opus 4.7 + Haiku 4.5 is a clean combo.

CLI mode

tradingagents
Enter fullscreen mode Exit fullscreen mode

Interactive prompt asks for ticker, date, LLM provider, and research depth. Great for quick experiments.

Backtest performance (paper figures)

On AAPL / GOOGL / AMZN:

Metric TradingAgents Baseline
Cumulative Return 23.21~26.62% lower
Annualized Return up to 30.5% lower
Sharpe Ratio improved baseline
Max Drawdown improved baseline

Source: arXiv 2412.20138 v7 (Yijia Xiao et al.).

What's NOT in the paper

The honest limitations:

  • Slippage is not modeled
  • Taxes are not deducted
  • Market impact for non-trivial positions is ignored
  • Live data latency is assumed-zero in backtest
  • All numbers are AAPL/GOOGL/AMZN only — large-cap US tech, the easiest regime

Translation: treat it as a research framework. Don't put real capital based on the paper figures alone.

Generalizing the pattern

The interesting part isn't trading — it's the multi-agent debate pattern itself. The same 5-layer structure could be applied to:

  • Content production (writer + editor + SEO + reviewer + final approver)
  • Marketing campaigns (strategy + copy + design + measurement)
  • Hiring decisions (technical + culture-fit + reference + final)
  • Pricing decisions (cost + market + competitive + customer)

Anywhere a single human or single LLM tends to confirm-bias their initial take, structured debate between 2+ adversarial agents helps.

Final thoughts

TradingAgents is not a "make money with AI" toolkit. It's a research framework that demonstrates the multi-agent paradigm works in a high-stakes domain. The v0.2.4 additions (structured outputs, checkpoints, persistent memory) make it actually usable for serious experimentation — not just paper-friendly demos.

Worth cloning, reading the LangGraph code, and stealing patterns for your own multi-agent systems.

Links


⚠️ Disclaimer: This article analyzes the TradingAgents framework from a technical perspective. It is not investment advice. Backtest numbers are historical and do not guarantee future returns. Always paper trade extensively before deploying real capital.

Top comments (0)