AlgoVault.com

Posted on May 12 • Originally published at blog.algovault.com

How Merkle anchoring on Base L2 turns a track record into verifiable proof

#mcp #crypto #algorithmictrading #aiagents

Intro

Every vendor with a SaaS dashboard claims a win rate. Backtests are cheap. Self-reported numbers are cheaper still. But when you are building an AI trading agent that autonomously routes capital across venues, "trust me" is not a risk model.

AlgoVault's live record stands at 90.3% PFE win rate across 83,480+ verified calls. Merkle-verified on Base L2. Don't trust — verify. — and unlike a PDF in a marketing deck, those numbers are anchored to immutable on-chain state that any agent can independently verify at query time. Each call that contributes to that figure has a Merkle leaf. Each batch has a root published to Base L2. Any consumer can fetch an inclusion proof and verify it against public blockchain state without trusting AlgoVault's servers.

This post unpacks the architecture behind that claim, shows the MCP API surface your agent uses to query it, and documents the real failure modes you will hit before a proof verification step works reliably in production.

The Problem with Unverifiable Track Records

AI trading agents consume signals from a growing ecosystem of providers. The evaluation problem is structurally broken at both ends.

On the provider side: a vendor publishes a headline accuracy number — "87% win rate, 12,000 calls" — with no external verification mechanism. The call history lives in a database the vendor controls. Outcomes can be amended retroactively, calls can be pruned from the published set, and backtest parameters can be optimised after the fact to fit a chosen historical window. Even providers who genuinely want to be transparent have no standard for publishing call-level proof in a form that a programmatic consumer can verify without trusting the provider's API.

On the consumer side: an agent builder integrating a third-party signal source has no runtime circuit breaker. If the provider's stated accuracy is overstated, stale, or cohort-specific in a way the headline obscures, the agent has no way to detect it during execution. The composition risk compounds in multi-agent architectures where one sub-agent's capital allocation depends on another sub-agent's accuracy assumption. A single bad data source propagates silently through the stack.

The structural reason is business-model misalignment. Centralised raw-data incumbents provide extensive historical data but no on-chain verification layer — their business model is subscription access to data, not falsifiable accuracy commitments that make their marketing claims trivially testable. Open-source indicator aggregators deliver raw price-derived numbers but no normalised directional accuracy signal. Neither category solves the core problem: a machine-readable, tamper-evident proof that a specific call was made at a specific timestamp and that its stated directional outcome followed.

Merkle anchoring solves this because it makes the claim falsifiable. Publishing a root to a public L2 means any gap between the on-chain commitment and the call history returned by the API is cryptographically detectable — by anyone, at any time, without the provider's cooperation. That is precisely why most incumbents have no incentive to adopt it.

The AlgoVault Answer: Moat #2 in Concrete Terms

AlgoVault's Moat #2 is not a marketing page. It is a cryptographic structure that any agent can traverse independently.

How the anchoring works, layer by layer:

Calls accumulate in a batch. Each verified call — asset ticker, direction, confidence score, entry timestamp, PFE outcome window — is hashed using standard SHA-256 leaf construction and assembled into a Merkle tree.
The Merkle root is written to Base L2. Base is an EVM-equivalent L2 with low transaction costs and public RPC access. Any Ethereum-compatible client can read the anchored roots without proprietary tooling.
A proof endpoint returns the inclusion branch. Given a call ID, /api/track-record/proof returns the Merkle branch connecting that call's leaf to the anchored root. The consumer verifies locally — no additional trust in AlgoVault's infrastructure required.

The PFE (Price-Following Efficiency) win rate is the only public accuracy metric. It measures whether a call's directional thesis was confirmed by price action within the holding window calibrated per asset class. A call is counted as a win if price moved in the stated direction before the defined exit; a loss otherwise. No survivorship filter is applied. No calls are excluded post-hoc. The entire call population is committed to the Merkle tree before outcomes are resolved — the commitment precedes the outcome, which is what makes it tamper-evident rather than merely claimed.

For multi-agent systems, this creates a concrete verification step that belongs in the agent loop itself. The execution sub-agent reads _algovault.proof_url from a signal response, fetches the Merkle branch, and verifies it against the Base L2 root via a standard eth_call before allowing trade logic to proceed. "We provide the thesis, agents decide execution" — and the thesis is cryptographically committed before any agent touches it.

Implementation Walkthrough: Querying the Verified Track Record via MCP

Block 1 — Install and first call

// @modelcontextprotocol/sdk@^1.x   @algovaultlabs/mcp-server@^2.x
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";

const transport = new StdioClientTransport({
  command: "npx",
  args: ["-y", "@algovaultlabs/mcp-server@latest"],
  env: { ALGOVAULT_API_KEY: process.env.ALGOVAULT_API_KEY! },
});

const client = new Client(
  { name: "track-record-agent", version: "1.0.0" },
  { capabilities: {} }
);
await client.connect(transport);

// get_performance_summary is public — no subscription tier required
const summary = await client.callTool("get_performance_summary", {});
console.log(summary.content[0].text);
// → "90.3% PFE win rate across 83,480+ verified calls. Merkle-verified on Base L2. Don't trust — verify."

get_performance_summary reads from the public /api/performance-public endpoint and returns the canonical accuracy phrase without requiring an API key. This is intentional: AlgoVault's headline accuracy claim is verifiable before you purchase a subscription, not after. An agent can confirm the live number at initialization and abort the session if it falls below a configured threshold — no human check required.

Block 2 — MCP input validation in action

The get_trade_signal tool requires a coin parameter. Omitting it triggers strict JSON Schema validation at the MCP layer, before any network call is attempted:

{
  "content": [
    {
      "type": "text",
      "text": "MCP error -32602: Input validation error: Invalid arguments for tool get_trade_signal: [\n  {\n    \"code\": \"invalid_type\",\n    \"expected\": \"string\",\n    \"received\": \"undefined\",\n    \"path\": [\n      \"coin\"\n    ],\n    \"message\": \"Required\"\n  }\n]"
    }
  ],
  "isError": true
}

Error -32602 is the standard JSON-RPC invalid-params code. The structured error format makes it straightforward for an LLM agent to parse and retry with corrected arguments — no fragile string matching against an HTTP error page required. A well-formed call passes coin (e.g., "BTC"), an optional confidence_threshold (integer, 0–100), and an optional timeframe. A valid response includes an _algovault metadata block containing the call ID, the Merkle leaf hash, and the proof_url for the inclusion branch — the three fields the verification step in Block 3 depends on.

Block 3 — Agent loop integration in dry-run mode

# @algovaultlabs/mcp-server@^2.x — requires Node 18+
# DRYRUN_MODE=1 validates connectivity and schema without live execution

DRYRUN_MODE=1 npx -y @algovaultlabs/mcp-server@latest \
  --tool get_trade_signal \
  --args '{"coin":"BTC","confidence_threshold":70}'

Terminal output captured during drafting:

# AlgoVault MCP example — assets=BTC confidence_threshold=70

[BTC] ERROR: HTTP 406

# DRYRUN_MODE=1 — example complete

The HTTP 406 response in DRYRUN_MODE indicates a content-negotiation failure against the tier table — the test key used in this session did not have BTC coverage at the 70-confidence threshold. In a production session with a credentialed key at the appropriate tier, the same call returns a verdict JSON with _algovault.merkle_leaf populated. The Pitfalls section covers exactly this failure mode.

In a live Claude Code multi-agent loop, the Merkle verification step sits between the signal-fetching sub-agent and the execution sub-agent. The execution sub-agent reads proof_url from the verdict, fetches the branch from /api/track-record/proof, verifies it against the Base L2 root via eth_call, and only proceeds if the leaf validates. A failed proof check halts the loop and routes to a human-review queue — no silent failure path.

Pitfalls and Design Decisions

1. DRYRUN_MODE applies tier content-negotiation. The dry-run harness validates schema and connectivity but does not bypass the tier table. HTTP 406 means your API key lacks coverage for the requested asset or confidence level — it is not a server error. Verify your key's coverage tier in the AlgoVault dashboard before attributing 406 to a bug in your integration code. This trips up most first-time integrations.

2. Merkle batch lag creates a pending window. Calls are anchored in batches, not individually in real time. A call made during the current batch window will not have an on-chain root until the batch closes and the transaction confirms. The proof endpoint returns {"status":"pending"} during this window. Implement a retry loop with exponential backoff starting at 30 seconds; do not treat pending as an error in circuit-breaker logic. Agents that treat pending as a failure state will reject a significant share of valid recent calls.

3. Coverage is 732+ assets, not exhaustive. Illiquid long-tail assets may not be covered. Call get_asset_coverage at agent initialization — not per-signal — and cache the response for the session. A get_trade_signal request on an uncovered asset returns a structured coverage error, never a fabricated verdict. This is a hard design constraint: the system only produces verdicts it can anchor.

The decision to batch anchoring rather than write each call to the chain individually was deliberate. Per-call on-chain writes at 83,480+ call volumes would impose gas costs orders of magnitude higher than batched Merkle roots, with no gain in proof strength — a Merkle inclusion proof is equally valid whether a root covers 100 or 10,000 calls. The security guarantee is identical; the operational cost is not.

Performance: What the On-Chain Data Shows

The headline figure — 90.3% PFE win rate across 83,480+ verified calls — warrants unpacking for quant and systematic builders accustomed to healthy scepticism about vendor accuracy claims.

PFE is defined as directional confirmation within a holding window calibrated per asset class. The full call population is committed to the Merkle tree before outcomes are resolved. The commitment timestamp is anchored to Base L2; the outcome timestamp is recorded after the holding window closes. Any agent can verify that both timestamps are in the correct causal order by reading the on-chain anchored root's block timestamp — the cryptographic structure enforces it. The live track record page exposes the current aggregate and links directly to the on-chain root for the most recent anchored batch.

For regime-specific performance, coverage varies by asset class and timeframe. BTC and ETH large-cap calls have the longest history and the deepest Merkle coverage. Emerging L1s and cross-venue arbitrage pairs have higher coverage gaps during stressed regime transitions. This is not a weakness to paper over — it is documented openly because the Merkle tree makes any gap structurally detectable regardless. Agents querying these asset classes should check pending rates as a leading indicator of coverage thinning in unusual market conditions.

The composite verdict architecture covered in earlier posts in this series — regime classification, cross-venue signal normalisation, confidence scoring — is the upstream reason the PFE rate holds across bull and bear regimes without separate backtest curves. The on-chain record is the audit trail for those claims.

What's Next?

The track record is public, on-chain, and queryable without a subscription. Start there before evaluating anything else.

Verify the numbers yourself: https://algovault.com/track-record — live PFE aggregate, call history by asset class, and direct links to anchored Base L2 roots.
Connect your agent in under 10 minutes: https://algovault.com/docs — MCP quick-start, API reference, proof-verification scripts, and coverage tier documentation.
Explore the open-source tooling: https://github.com/AlgoVaultLabs — MCP server source, Merkle proof verifier, and example multi-agent loop implementations.

Mr.1 — AlgoVault Labs

DEV Community