Wallet Guy

Posted on Jul 2 • Edited on Jul 16

683 Test Files Later: How We Validate AI Agent Wallet Infrastructure

#testing #typescript #architecture #opensource

Your AI agent can browse the web, write code, and manage files — but can it actually touch money? That's the gap WAIaaS was built to close: a self-hosted, open-source Wallet-as-a-Service that gives your AI agent a real blockchain wallet, a policy engine, and a transaction pipeline it can use autonomously. And before any of that ships to production, it has to pass more than 683 test files.

Why Test Coverage Matters for Wallet Infrastructure

When your agent sends an email, a bug means a bad email. When your agent sends 0.1 ETH to the wrong address, a bug means lost funds. The stakes are categorically different.

This isn't about chasing a coverage number. It's about the fact that wallet infrastructure for AI agents sits at the intersection of two unforgiving domains: financial transactions (irreversible, high-stakes) and autonomous software (runs without human review). If you're building an agent on top of a wallet layer, you need to know that layer has been beaten up extensively before you trust it with real assets.

Here's a practical look at what WAIaaS actually tests, and more importantly, what that means for you as a developer building on top of it.

The Architecture Under Test

WAIaaS is a 15-package monorepo. Each package has its own test suite, and together they cover every layer of the system an AI agent will touch.

actions, adapters, admin, cli, core, daemon, desktop-spike,
e2e-tests, mcp, openclaw-plugin, push-relay, sdk, shared, skills, wallet-sdk

That's 683+ test files spread across packages that include:

The transaction pipeline — a 7-stage pipeline covering validate, auth, policy, wait, execute, and confirm
The policy engine — 21 policy types and 4 security tiers
45 MCP tools — every tool your Claude or LangChain agent will call
15 DeFi protocol integrations — including Jupiter, Aave v3, Hyperliquid, and more
39 REST API route modules — every endpoint the SDK talks to

When you call client.sendToken() from the TypeScript SDK, you're exercising code that has been tested at the unit level, the integration level, and the pipeline level. Let's walk through each of those layers.

Layer 1: The Transaction Pipeline

Every transaction an AI agent submits goes through a 7-stage pipeline:

stage1-validate — schema validation and chain-specific checks
stage2-auth — session token verification
stage3-policy — policy engine evaluation against all active policies
stage4-wait — handles DELAY and APPROVAL tier transactions
stage5-execute — signs and broadcasts to the network
stage6-confirm — monitors for on-chain confirmation

This pipeline is what stands between your agent's intent and an actual blockchain transaction. The test suite covers every stage, including the unhappy paths: what happens when a policy blocks a transaction, what happens when a DELAY times out, what happens when broadcast fails and needs to be retried.

From your agent's perspective, this pipeline is transparent. You submit a transaction and get back a status. But knowing it's there — and tested — is what lets you trust the output.

Here's a basic send from the TypeScript SDK:

import { WAIaaSClient, WAIaaSError } from '@waiaas/sdk';

const client = new WAIaaSClient({
  baseUrl: process.env['WAIAAS_BASE_URL'] ?? 'http://localhost:3100',
  sessionToken: process.env['WAIAAS_SESSION_TOKEN'],
});

// Step 1: Check wallet balance
const balance = await client.getBalance();
console.log(`Balance: ${balance.balance} ${balance.symbol} (${balance.chain}/${balance.network})`);

// Step 2: Send tokens
const sendResult = await client.sendToken({
  type: 'TRANSFER',
  to: 'recipient-address',
  amount: '0.001',
});
console.log(`Transaction submitted: ${sendResult.id} (status: ${sendResult.status})`);

// Step 3: Poll for confirmation
const POLL_TIMEOUT_MS = 60_000;
const startTime = Date.now();
while (Date.now() - startTime < POLL_TIMEOUT_MS) {
  const tx = await client.getTransaction(sendResult.id);
  if (tx.status === 'COMPLETED') {
    console.log(`Transaction confirmed! Hash: ${tx.txHash}`);
    break;
  }
  if (tx.status === 'FAILED') {
    console.error(`Transaction failed: ${tx.error}`);
    break;
  }
  await new Promise(resolve => setTimeout(resolve, 1000));
}

The pattern is simple because the pipeline complexity is encapsulated. Your agent doesn't need to know about stage3-policy or stage4-wait. It just polls for COMPLETED.

Layer 2: The Policy Engine

The policy engine is probably the most critical thing to get right, and it's the part of WAIaaS with the most test surface area.

21 policy types. 4 security tiers. Default-deny enforcement. A single misconfigured policy could either block legitimate agent transactions or, worse, allow a transaction that should have required human approval.

The 4 tiers tell you exactly what will happen to a transaction:

INSTANT   — Execute immediately, no notification
NOTIFY    — Execute immediately, send notification
DELAY     — Queue for delay_seconds, then execute (cancellable)
APPROVAL  — Require human approval via WalletConnect/Telegram/Push

Testing this correctly means verifying every boundary condition. An instant_max_usd of $10 means a $10.00 transaction is INSTANT and a $10.01 transaction is NOTIFY. Those boundary tests exist in the suite.

Here's what a spending limit policy looks like when you create it via the REST API:

curl -X POST http://127.0.0.1:3100/v1/policies \
  -H "Content-Type: application/json" \
  -H "X-Master-Password: my-secret-password" \
  -d '{
    "walletId": "<wallet-uuid>",
    "type": "SPENDING_LIMIT",
    "rules": {
      "instant_max_usd": 100,
      "notify_max_usd": 500,
      "delay_max_usd": 2000,
      "delay_seconds": 900,
      "daily_limit_usd": 5000
    }
  }'

And if a transaction is blocked, the error response is structured and actionable:

{
  "error": {
    "code": "POLICY_DENIED",
    "message": "Transaction denied by SPENDING_LIMIT policy",
    "domain": "POLICY",
    "retryable": false
  }
}

Your agent can catch this, log it, and surface it to a human rather than silently failing. The test suite covers both the policy evaluation logic and the error response format, so you can build reliable error handling on top of a stable contract.

Layer 3: The MCP Tools

If you're building on Claude or another MCP-compatible framework, your agent interacts with WAIaaS through 45 MCP tools. Every one of those tools is a tested surface.

The tool list covers the full range of what an autonomous agent might need:

Wallet operations: get-balance, get-address, get-assets, get-wallet-info
Transactions: send-token, send-batch, sign-transaction, simulate-transaction
DeFi: action-provider, get-defi-positions, get-health-factor
NFTs: get-nft-metadata, list-nfts, transfer-nft
Protocol-specific: hyperliquid, polymarket, x402-fetch
Security/auth: erc8004-get-reputation, wc-connect, list-sessions

Setting up MCP with Claude Desktop takes one command:

waiaas mcp setup --all    # Auto-register all wallets with Claude Desktop

Or you can configure it manually in claude_desktop_config.json:

{
  "mcpServers": {
    "waiaas": {
      "command": "npx",
      "args": ["-y", "@waiaas/mcp"],
      "env": {
        "WAIAAS_BASE_URL": "http://127.0.0.1:3100",
        "WAIAAS_SESSION_TOKEN": "wai_sess_<your-token>",
        "WAIAAS_DATA_DIR": "~/.waiaas"
      }
    }
  }
}

After that, Claude can call get_balance, send_token, or execute_action the same way it calls any other tool — but now the infrastructure behind those calls has been validated by 683+ test files.

Layer 4: The DeFi Protocol Integrations

15 DeFi protocol providers are integrated in WAIaaS:

aave-v3, across, dcent-swap, drift, erc8004, hyperliquid,
jito-staking, jupiter-swap, kamino, lido-staking, lifi,
pendle, polymarket, xrpl-dex, zerox-swap

Each provider has its own action logic. Testing these means mocking RPC calls, simulating swap quotes, and verifying that the action payload built for Jupiter looks correct before it ever hits mainnet.

There's also a dry-run capability built into the transaction pipeline. Before your agent executes a DeFi action for real, it can simulate it:

curl -X POST http://127.0.0.1:3100/v1/transactions/send \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer wai_sess_<token>" \
  -d '{
    "type": "TRANSFER",
    "to": "recipient-address",
    "amount": "0.1",
    "dryRun": true
  }'

This is a first-class feature of the API, not a workaround. You can build agents that dry-run before executing and only proceed if the simulation succeeds.

Quick Start: Running It Yourself

You don't need to take our word for the test coverage. You can clone the repo and run the suite locally. But if you want to get an agent connected first:

Step 1 — Install the CLI and start the daemon

npm install -g @waiaas/cli
waiaas init
waiaas start

Step 2 — Create wallets and sessions in one command

waiaas quickset --mode mainnet

Step 3 — Connect to Claude Desktop

waiaas mcp setup --all

Step 4 — Or use the TypeScript SDK directly

npm install @waiaas/sdk

import { WAIaaSClient } from '@waiaas/sdk';

const client = new WAIaaSClient({
  baseUrl: 'http://127.0.0.1:3100',
  sessionToken: process.env.WAIAAS_SESSION_TOKEN,
});

const balance = await client.getBalance();
console.log(`${balance.balance} ${balance.symbol}`);

Step 5 — Set a policy before going to mainnet

Use the policy API to configure a spending limit before you let your agent run autonomously. Even a simple SPENDING_LIMIT with a $10 instant max gives you a meaningful safety net.

What 683 Test Files Actually Tells You

A number like 683 test files is only meaningful in context. Here's the context:

The transaction pipeline has stages that are individually tested — including the gas condition stage that holds transactions until gas price meets a threshold
The policy engine covers all 21 policy types and their boundary conditions
The 3 auth methods (masterAuth with Argon2id, ownerAuth with SIWS/SIWE, sessionAuth with JWT HS256) are each tested independently
The 45 MCP tools are tested against the same transaction and policy infrastructure
The OpenAPI 3.0 spec is auto-generated and available at /doc, with an interactive reference UI at /reference — so the API you're coding against is validated against the implementation

None of this means bugs don't exist. It means the team has invested in the kind of validation infrastructure that gives you a reasonable basis for trust when you're building financial tooling for autonomous agents.

What's Next

If you want to go deeper on how the policy engine works in practice — especially the default-deny behavior and how to configure policies for different agent risk profiles — that's worth its own read. The security model has three distinct layers (session auth, time delay and approval, monitoring and kill switch) that work together in ways that aren't obvious from the policy API alone.

The best next step is to get a local instance running and connect it to your agent. The CLI makes that fast, and the MCP integration means you can be talking to a real wallet from Claude Desktop in under ten minutes.