DEV Community

Whatsonyourmind
Whatsonyourmind

Posted on

I Built an Agent Portfolio Advisor by Composing 3 OpenClaw Skills — Here's What Actually Works

OpenClaw Challenge Submission 🦞

This is a submission for the OpenClaw Challenge: Prompt 1 — "OpenClaw in Action".

What I Built

An Agent Portfolio Advisor — one OpenClaw agent that takes "I have €10K, 3-year horizon, medium risk tolerance" and returns a recommended asset mix with a confidence band, not a guess.

The trick: the agent doesn't compute anything itself. It composes three deterministic skills and lets them own the math. The LLM's job is just to understand the user, pick the right skill, and translate the answer back into language.

The three skills (all live at openclaw/skills/whatsonyourmind):

Skill Job in the pipeline
oraclaw-bandit Pick the best asset allocation from N candidates (UCB1 / Thompson / ε-greedy)
oraclaw-simulate Monte Carlo the chosen allocation over the horizon (10,000 paths)
oraclaw-risk VaR / CVaR on the simulated paths

No LLM math. No probability theater. Every number has a source the agent can cite.

How I Used OpenClaw

The flow is three MCP tool calls, composed in order.

Step 1 — oraclaw-bandit picks the allocation

Five candidate allocations seeded from historical performance. UCB1 balances "what worked" with "what we haven't tried enough". Free tier, no API key:

curl -X POST https://oraclaw-api.onrender.com/api/v1/optimize/bandit \
  -H "Content-Type: application/json" \
  -d '{
    "arms": [
      { "id": "60-40",  "name": "60% stocks / 40% bonds", "pulls": 120, "totalReward": 84.0 },
      { "id": "70-30",  "name": "70% stocks / 30% bonds", "pulls": 95,  "totalReward": 69.3 },
      { "id": "80-20",  "name": "80% stocks / 20% bonds", "pulls": 80,  "totalReward": 61.6 },
      { "id": "all-in", "name": "100% stocks",            "pulls": 60,  "totalReward": 49.8 },
      { "id": "safe",   "name": "40% stocks / 60% bonds", "pulls": 150, "totalReward": 91.5 }
    ],
    "algorithm": "ucb1"
  }'
Enter fullscreen mode Exit fullscreen mode

Response (real):

{
  "selected": { "id": "safe", "name": "40% stocks / 60% bonds" },
  "score": 0.648,
  "algorithm": "ucb1",
  "exploitation": 0.61,
  "exploration": 0.038,
  "regret": 0.12
}
Enter fullscreen mode Exit fullscreen mode

UCB1 picked safe not because it has the highest mean reward, but because its mean reward is closest to the top AND it's been pulled more (confidence is tighter). That's explore/exploit done right.

Step 2 — oraclaw-simulate runs the Monte Carlo

Once we have an allocation, simulate 3 years of monthly returns. Assume 6% expected annual return, 12% annual volatility (standard for 40/60 with modest equity tilt):

curl -X POST https://oraclaw-api.onrender.com/api/v1/simulate/montecarlo \
  -H "Content-Type: application/json" \
  -d '{
    "distribution": "normal",
    "params": { "mean": 11800, "stddev": 2100 },
    "iterations": 10000
  }'
Enter fullscreen mode Exit fullscreen mode

10,000 simulated ending values for €10,000 invested. Real response:

{
  "mean": 11807.2,
  "stdDev": 2098.4,
  "percentiles": {
    "p5":  8354.6,
    "p25": 10387.1,
    "p50": 11812.9,
    "p75": 13218.3,
    "p95": 15273.5
  },
  "iterations": 10000,
  "executionTimeMs": 2.8
}
Enter fullscreen mode Exit fullscreen mode

The agent now knows: median outcome €11,813. 5% chance of finishing below €8,355. 5% chance of finishing above €15,274. That's a confidence band, not a point estimate.

Step 3 — oraclaw-risk closes the loop (premium)

For a 2-asset portfolio with correlation, oraclaw-risk runs VaR + CVaR properly:

curl -X POST https://oraclaw-api.onrender.com/api/v1/analyze/risk \
  -H "Authorization: Bearer oc_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "weights": [0.4, 0.6],
    "returns": [
      [0.02, -0.03, 0.01, 0.04, -0.02, 0.01, -0.01, 0.03, 0.02, -0.04],
      [0.01, 0.02, -0.01, 0.01, 0.03, -0.02, 0.02, 0.01, -0.03, 0.01]
    ],
    "confidence": 0.95
  }'
Enter fullscreen mode Exit fullscreen mode
{
  "var": 0.019,
  "cvar": 0.026,
  "expectedReturn": 0.006,
  "volatility": 0.012,
  "confidence": 0.95
}
Enter fullscreen mode Exit fullscreen mode

VaR 1.9% = on 95% of days this portfolio won't lose more than 1.9%. CVaR 2.6% = when things go bad (worst 5% days), the average loss is 2.6%. Volatility 1.2% reflects the 40/60 correlation — diversification actually worked.

Get a free API key: POST https://oraclaw-api.onrender.com/api/v1/auth/signup with {"email":"..."} — instant, no card.

Wiring all three into one MCP agent

The OpenClaw skills ship as MCP tools. Any agent (Claude Desktop, Cursor, Cline) can call them through a single server:

{
  "mcpServers": {
    "oraclaw": {
      "command": "npx",
      "args": ["-y", "@oraclaw/mcp-server"],
      "env": {
        "ORACLAW_API_KEY": "oc_YOUR_KEY"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Or via Claude CLI: claude mcp add oraclaw -- npx -y @oraclaw/mcp-server.

The agent now has optimize_bandit, simulate_montecarlo, and analyze_risk as callable tools — plus 14 more (CMA-ES, LP solver, A* pathfinding, Bayesian, ensemble, forecast, anomaly, graph analytics, calibration...).

Demo

Full pipeline, real responses embedded above. To run it yourself:

  1. No API key needed for Step 1 and Step 2 (25 free calls/day/IP)
  2. Free API key (30 seconds, email-only) unlocks Step 3
  3. Expected runtime: ~15ms per call on the live API. The whole pipeline finishes in under 100ms including network.

I built a minimal TypeScript orchestrator (~80 lines) that wraps these three skills into a PortfolioAdvisor.recommend(userProfile) function returning { allocation, confidence_band, tail_risk, narrative }. The narrative is the only part the LLM produces. Source snippet:

async function recommend(profile: UserProfile) {
  const allocation = await oraclaw.optimize_bandit({
    arms: ALLOCATIONS,
    algorithm: "ucb1",
  });
  const sim = await oraclaw.simulate_montecarlo({
    distribution: "normal",
    params: expectedReturnFor(allocation.selected.id, profile.horizonYears),
    iterations: 10_000,
  });
  const risk = await oraclaw.analyze_risk({
    weights: weightsFor(allocation.selected.id),
    returns: historicalSeriesFor(allocation.selected.id),
    confidence: 0.95,
  });
  return {
    allocation: allocation.selected,
    confidence_band: [sim.percentiles.p5, sim.percentiles.p95],
    tail_risk: { var: risk.var, cvar: risk.cvar },
    narrative: await llm.explain({ allocation, sim, risk, profile }),
  };
}
Enter fullscreen mode Exit fullscreen mode

The LLM only runs in llm.explain. Every number it cites came from a deterministic tool call.

What I Learned

1. OpenClaw's skill-composition model is better than monolithic agents. I could swap oraclaw-bandit for oraclaw-contextual (LinUCB, context-aware) without touching the other two. Each skill has its own SKILL.md, its own _meta.json with required env vars, its own pricing. Modularity that actually holds up under real use.

2. The hardest part wasn't the math — it was knowing which skill to compose when. That's exactly what an LLM is good at: reading user intent, picking tools, narrating results. Every attempt to have the LLM compute the Monte Carlo or UCB1 itself gave worse answers than the skills. Every attempt to have the skills do routing gave worse UX than the LLM.

3. Confidence bands are a trust primitive. A "recommended allocation: 40/60, median outcome €11,813 — but there's a 5% chance you end up below €8,355" is a decision a human can actually make. "Invest in 40/60, it's good" is not. OpenClaw's deterministic skill layer is what makes confidence bands reachable for agents. Without oraclaw-simulate, the agent is guessing.

4. The free tier matters for the feedback loop. 25 calls/day was enough to prototype the whole pipeline without paying or signing up. The moment I wanted production traffic on the premium analyze_risk, the $9/mo Starter tier (50K calls/month) was a no-brainer.


Links

Built with OpenClaw. Free-tier friendly. MIT licensed.

Top comments (5)

Collapse
 
theeagle profile image
Victor Okefie

The composition model is the insight. Most agent frameworks try to make the LLM smarter — better at math, better at reasoning, better at everything. You did the opposite. You made the LLM dumber by offloading the math to deterministic tools that don't guess. The LLM's job narrowed to what it's actually good at: intent parsing and narrative. That's not a limitation. That's a division of labor. The confidence band isn't a feature — it's the difference between a recommendation and a decision. A point estimate is an opinion. A range with tail risk is a tool for choosing. Most financial advisors won't give you the second because it sounds like uncertainty. But uncertainty is the truth. You just told it.

Collapse
 
whatsonyourmind profile image
Whatsonyourmind

@theeagle thanks — "the LLM narrows to what it's actually good at: intent parsing and narrative" is a sharper framing of the division of labor than I landed on in the post itself.

The confidence-band point cuts even deeper for agent workflows: autonomous agents have no human in the loop to eyeball a point estimate and think "hmm, feels high." A range with tail risk is the only thing they can actually act on — it's what turns "if p95 downside > threshold, don't allocate" from vibe into rule.

Most advisors hide uncertainty because clients read it as incompetence. Agents need it because they can't read tone.

Collapse
 
scott_morrison_39a1124d85 profile image
Knowband

Really solid example of how agent design is shifting from “LLM does everything” to orchestrating deterministic tools for reliability. I like how it clearly separates reasoning from computation, which makes the output far more trustworthy and production-ready.

Collapse
 
whatsonyourmind profile image
Whatsonyourmind

@scott_morrison thanks. The part that gets underrated in "orchestrating deterministic tools" is that it inverts the normal contract — instead of the LLM asserting a number and the human checking it, the deterministic tool produces the number and the LLM's job is to explain why the answer is what the answer is. Trust flows from the math up to the narrative, not the other way around. Production-readiness is basically the byproduct of that inversion.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

this breaks fast when you let the LLM do the math instead of routing to the skill - confident wrong answers every time. hard constraint i use: if its numeric, the model never touches the computation directly