This is a submission for the OpenClaw Challenge: Prompt 1 — "OpenClaw in Action".
What I Built
An Agent Portfolio Advisor — one OpenClaw agent that takes "I have €10K, 3-year horizon, medium risk tolerance" and returns a recommended asset mix with a confidence band, not a guess.
The trick: the agent doesn't compute anything itself. It composes three deterministic skills and lets them own the math. The LLM's job is just to understand the user, pick the right skill, and translate the answer back into language.
The three skills (all live at openclaw/skills/whatsonyourmind):
| Skill | Job in the pipeline |
|---|---|
oraclaw-bandit |
Pick the best asset allocation from N candidates (UCB1 / Thompson / ε-greedy) |
oraclaw-simulate |
Monte Carlo the chosen allocation over the horizon (10,000 paths) |
oraclaw-risk |
VaR / CVaR on the simulated paths |
No LLM math. No probability theater. Every number has a source the agent can cite.
How I Used OpenClaw
The flow is three MCP tool calls, composed in order.
Step 1 — oraclaw-bandit picks the allocation
Five candidate allocations seeded from historical performance. UCB1 balances "what worked" with "what we haven't tried enough". Free tier, no API key:
curl -X POST https://oraclaw-api.onrender.com/api/v1/optimize/bandit \
-H "Content-Type: application/json" \
-d '{
"arms": [
{ "id": "60-40", "name": "60% stocks / 40% bonds", "pulls": 120, "totalReward": 84.0 },
{ "id": "70-30", "name": "70% stocks / 30% bonds", "pulls": 95, "totalReward": 69.3 },
{ "id": "80-20", "name": "80% stocks / 20% bonds", "pulls": 80, "totalReward": 61.6 },
{ "id": "all-in", "name": "100% stocks", "pulls": 60, "totalReward": 49.8 },
{ "id": "safe", "name": "40% stocks / 60% bonds", "pulls": 150, "totalReward": 91.5 }
],
"algorithm": "ucb1"
}'
Response (real):
{
"selected": { "id": "safe", "name": "40% stocks / 60% bonds" },
"score": 0.648,
"algorithm": "ucb1",
"exploitation": 0.61,
"exploration": 0.038,
"regret": 0.12
}
UCB1 picked safe not because it has the highest mean reward, but because its mean reward is closest to the top AND it's been pulled more (confidence is tighter). That's explore/exploit done right.
Step 2 — oraclaw-simulate runs the Monte Carlo
Once we have an allocation, simulate 3 years of monthly returns. Assume 6% expected annual return, 12% annual volatility (standard for 40/60 with modest equity tilt):
curl -X POST https://oraclaw-api.onrender.com/api/v1/simulate/montecarlo \
-H "Content-Type: application/json" \
-d '{
"distribution": "normal",
"params": { "mean": 11800, "stddev": 2100 },
"iterations": 10000
}'
10,000 simulated ending values for €10,000 invested. Real response:
{
"mean": 11807.2,
"stdDev": 2098.4,
"percentiles": {
"p5": 8354.6,
"p25": 10387.1,
"p50": 11812.9,
"p75": 13218.3,
"p95": 15273.5
},
"iterations": 10000,
"executionTimeMs": 2.8
}
The agent now knows: median outcome €11,813. 5% chance of finishing below €8,355. 5% chance of finishing above €15,274. That's a confidence band, not a point estimate.
Step 3 — oraclaw-risk closes the loop (premium)
For a 2-asset portfolio with correlation, oraclaw-risk runs VaR + CVaR properly:
curl -X POST https://oraclaw-api.onrender.com/api/v1/analyze/risk \
-H "Authorization: Bearer oc_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"weights": [0.4, 0.6],
"returns": [
[0.02, -0.03, 0.01, 0.04, -0.02, 0.01, -0.01, 0.03, 0.02, -0.04],
[0.01, 0.02, -0.01, 0.01, 0.03, -0.02, 0.02, 0.01, -0.03, 0.01]
],
"confidence": 0.95
}'
{
"var": 0.019,
"cvar": 0.026,
"expectedReturn": 0.006,
"volatility": 0.012,
"confidence": 0.95
}
VaR 1.9% = on 95% of days this portfolio won't lose more than 1.9%. CVaR 2.6% = when things go bad (worst 5% days), the average loss is 2.6%. Volatility 1.2% reflects the 40/60 correlation — diversification actually worked.
Get a free API key: POST https://oraclaw-api.onrender.com/api/v1/auth/signup with {"email":"..."} — instant, no card.
Wiring all three into one MCP agent
The OpenClaw skills ship as MCP tools. Any agent (Claude Desktop, Cursor, Cline) can call them through a single server:
{
"mcpServers": {
"oraclaw": {
"command": "npx",
"args": ["-y", "@oraclaw/mcp-server"],
"env": {
"ORACLAW_API_KEY": "oc_YOUR_KEY"
}
}
}
}
Or via Claude CLI: claude mcp add oraclaw -- npx -y @oraclaw/mcp-server.
The agent now has optimize_bandit, simulate_montecarlo, and analyze_risk as callable tools — plus 14 more (CMA-ES, LP solver, A* pathfinding, Bayesian, ensemble, forecast, anomaly, graph analytics, calibration...).
Demo
Full pipeline, real responses embedded above. To run it yourself:
- No API key needed for Step 1 and Step 2 (25 free calls/day/IP)
- Free API key (30 seconds, email-only) unlocks Step 3
- Expected runtime: ~15ms per call on the live API. The whole pipeline finishes in under 100ms including network.
I built a minimal TypeScript orchestrator (~80 lines) that wraps these three skills into a PortfolioAdvisor.recommend(userProfile) function returning { allocation, confidence_band, tail_risk, narrative }. The narrative is the only part the LLM produces. Source snippet:
async function recommend(profile: UserProfile) {
const allocation = await oraclaw.optimize_bandit({
arms: ALLOCATIONS,
algorithm: "ucb1",
});
const sim = await oraclaw.simulate_montecarlo({
distribution: "normal",
params: expectedReturnFor(allocation.selected.id, profile.horizonYears),
iterations: 10_000,
});
const risk = await oraclaw.analyze_risk({
weights: weightsFor(allocation.selected.id),
returns: historicalSeriesFor(allocation.selected.id),
confidence: 0.95,
});
return {
allocation: allocation.selected,
confidence_band: [sim.percentiles.p5, sim.percentiles.p95],
tail_risk: { var: risk.var, cvar: risk.cvar },
narrative: await llm.explain({ allocation, sim, risk, profile }),
};
}
The LLM only runs in llm.explain. Every number it cites came from a deterministic tool call.
What I Learned
1. OpenClaw's skill-composition model is better than monolithic agents. I could swap oraclaw-bandit for oraclaw-contextual (LinUCB, context-aware) without touching the other two. Each skill has its own SKILL.md, its own _meta.json with required env vars, its own pricing. Modularity that actually holds up under real use.
2. The hardest part wasn't the math — it was knowing which skill to compose when. That's exactly what an LLM is good at: reading user intent, picking tools, narrating results. Every attempt to have the LLM compute the Monte Carlo or UCB1 itself gave worse answers than the skills. Every attempt to have the skills do routing gave worse UX than the LLM.
3. Confidence bands are a trust primitive. A "recommended allocation: 40/60, median outcome €11,813 — but there's a 5% chance you end up below €8,355" is a decision a human can actually make. "Invest in 40/60, it's good" is not. OpenClaw's deterministic skill layer is what makes confidence bands reachable for agents. Without oraclaw-simulate, the agent is guessing.
4. The free tier matters for the feedback loop. 25 calls/day was enough to prototype the whole pipeline without paying or signing up. The moment I wanted production traffic on the premium analyze_risk, the $9/mo Starter tier (50K calls/month) was a no-brainer.
Links
- All 14 OraClaw skills on ClawHub: openclaw/skills/whatsonyourmind
- MCP server (one npm install): @oraclaw/mcp-server
-
Free API key signup:
POST https://oraclaw-api.onrender.com/api/v1/auth/signup - 17 tools, schemas, source: github.com/Whatsonyourmind/oraclaw
Built with OpenClaw. Free-tier friendly. MIT licensed.
Top comments (5)
The composition model is the insight. Most agent frameworks try to make the LLM smarter — better at math, better at reasoning, better at everything. You did the opposite. You made the LLM dumber by offloading the math to deterministic tools that don't guess. The LLM's job narrowed to what it's actually good at: intent parsing and narrative. That's not a limitation. That's a division of labor. The confidence band isn't a feature — it's the difference between a recommendation and a decision. A point estimate is an opinion. A range with tail risk is a tool for choosing. Most financial advisors won't give you the second because it sounds like uncertainty. But uncertainty is the truth. You just told it.
@theeagle thanks — "the LLM narrows to what it's actually good at: intent parsing and narrative" is a sharper framing of the division of labor than I landed on in the post itself.
The confidence-band point cuts even deeper for agent workflows: autonomous agents have no human in the loop to eyeball a point estimate and think "hmm, feels high." A range with tail risk is the only thing they can actually act on — it's what turns "if p95 downside > threshold, don't allocate" from vibe into rule.
Most advisors hide uncertainty because clients read it as incompetence. Agents need it because they can't read tone.
Really solid example of how agent design is shifting from “LLM does everything” to orchestrating deterministic tools for reliability. I like how it clearly separates reasoning from computation, which makes the output far more trustworthy and production-ready.
@scott_morrison thanks. The part that gets underrated in "orchestrating deterministic tools" is that it inverts the normal contract — instead of the LLM asserting a number and the human checking it, the deterministic tool produces the number and the LLM's job is to explain why the answer is what the answer is. Trust flows from the math up to the narrative, not the other way around. Production-readiness is basically the byproduct of that inversion.
this breaks fast when you let the LLM do the math instead of routing to the skill - confident wrong answers every time. hard constraint i use: if its numeric, the model never touches the computation directly