We have vibe coding now — so I guess we have vibe investing too. Most "AI stock picker" tools work the same way: you feed in a ticker, and out comes "BUY — confidence 87%" with no way to see why. You're not analyzing anything; you're trusting a vibe. If I can't inspect the reasoning, I can't trust the call, and I definitely can't learn from it.
So I built the opposite: a multi-agent committee where the output isn't a score — it's an auditable trail from raw evidence to a final trade thesis. Bull and bear analysts argue the ticker out, a risk manager signs off, and the final decision is required to cite the specific evidence it rests on. I called it VerumTrade (Latin verum, "truth"). It's open source (Apache-2.0), and this post is about how it's put together, not a pitch.
The core idea: audit the call instead of trusting the vibe
The design constraint that drove everything: at every step, you should be able to read the reasoning and inspect the evidence it was built on. No black box. That turned out to map naturally onto a multi-agent graph, where each stage produces structured output the next stage consumes.
The pipeline runs six stages:
- Analysts gather market, news, social, fundamental, and catalyst evidence in parallel.
- Evidence graph distills those raw findings into structured, deduplicated facts — each with a stable ID.
- Bull/Bear debate — two agents argue the thesis from opposite sides.
- Trader plan turns the debate into an actionable proposal.
- Risk review checks sizing, timing, concentration, and downside.
- Decision records the final rationale plus the full trace.
Each arrow in that chain is a structured handoff, not a vibe. The debate step was the one that surprised me most — forcing an explicit adversarial pass (a Bear agent whose only job is to attack the thesis) catches a lot of motivated reasoning that a single "analyst" agent happily glosses over.
Standing on a predecessor's shoulders — and what I added
I'll be upfront: VerumTrade started from TradingAgents, an excellent open-source multi-agent trading framework. The committee, the bull/bear debate, the risk discussion — that lineage is theirs, and credit where it's due.
What bugged me about most of these systems (mine included, early on) is that the "reasoning" is just printed. You get nice markdown reports, but the final BUY/SELL/HOLD is effectively text-extracted, and nothing structurally ties the verdict to the facts that produced it. So I added the layer that was missing:
-
A typed evidence graph. Every fact, inference, and conflict is a structured object with an
id, supporting/contradicting fact IDs, a confidence, and a falsifier — not free-floating prose. -
A decision that must cite its evidence. The final decision contract requires a
rationale_evidence_idsfield — a non-empty list of the evidence IDs the call rests on. If the model can't point to what justifies the trade, validation fails. -
A schema-validated trade object. Action, order type, time-in-force are enums;
stop_lossandtake_profitare required numeric fields; limit/stop prices are checked for coherence. No "BUY, idk, maybe set a stop somewhere." - A self-audit + decision guard. A validation pass records violations, repairs applied, and whether the final action stayed consistent — and can abort with a reason instead of shipping a broken plan.
The point isn't that the predecessor is bad — it's that "show your work" should mean structured, linked, and validated, not printed.
Two things I had to get right
A two-tier LLM setup. Running every agent on a frontier model is slow and expensive; running everything on a cheap model is unreliable on the steps that matter. So routine extraction and summarization run on a fast/cheap tier, while the debate and risk judgment run on a stronger tier. This kept cost sane without gutting quality on the decisions that actually move the recommendation.
Provider independence. I started on one provider and immediately regretted hardcoding it. The pipeline now runs against OpenAI-compatible endpoints generally — I've run full pipelines on Qwen and other backends by overriding the base URL. If you're building anything multi-agent, decouple from a single vendor early; retrofitting it later is painful.
The piece I'm most proud of is a crowding / macro-pullback awareness check — a guard that flags when a thesis is leaning on a crowded, macro-sensitive setup that looks great right up until it doesn't. It came directly from watching the naive version confidently recommend names that were one Fed headline away from unwinding.
What it looks like to use
There's a web app, a CLI, and a plain Python API. The programmatic entry point is about as minimal as I could make it:
from verumtrade import run_pipeline
# Returns the full state plus a structured trade decision
result = run_pipeline(ticker="MU")
print(result.decision.rationale) # the human-readable thesis
print(result.decision.rationale_evidence_ids) # the evidence IDs the call rests on
print(result.traces) # evidence -> debate -> decision, step by step
The output isn't just a verdict — result.traces is the whole reasoning chain, and every decision points back at the evidence that justifies it.
Honest disclaimer
This is a research and decision-support tool, not an oracle and not financial advice. Market data can be delayed or wrong, LLM outputs can be wrong, and trading involves real risk of loss. I treat its output as a structured second opinion — something to challenge, not obey.
Where it's at
It's early and I'm actively building. If the architecture is interesting to you, or you want to poke holes in the multi-agent design, the repo is here: https://github.com/muye1202/VerumTrade
I'd genuinely like feedback on the evidence-graph and the "decision must cite its evidence" constraint — what would you want to see in the trace to actually trust a recommendation? That question is most of why I open-sourced it.
Top comments (0)