DEV Community

Cover image for 611 Tests for AI Agent Wallets: How We Validate Every Transaction Before It Hits Mainnet
Wallet Guy
Wallet Guy

Posted on

611 Tests for AI Agent Wallets: How We Validate Every Transaction Before It Hits Mainnet

Testing AI agent wallets is like debugging a program that can spend real money—you need bulletproof validation before any code touches mainnet. WAIaaS runs 631+ tests across its monorepo to ensure every transaction, policy check, and security layer works exactly as intended when your AI agents start managing actual funds.

Why Testing Matters for Financial AI Agents

When your trading bot executes a $10,000 swap or your DeFi agent stakes ETH, there's no "undo" button. A single bug in transaction validation could drain wallets. A faulty policy engine might let agents bypass spending limits. Broken auth could expose private keys.

Traditional software testing catches crashes and logic errors. Crypto wallet testing catches financial disasters.

WAIaaS Testing Architecture

The WAIaaS codebase includes 631+ test files spread across its 15-package monorepo. Every component that touches money, keys, or transactions has dedicated test coverage.

Core Components Under Test

The test suite validates:

  • 7-stage transaction pipeline — Each stage (validate, auth, policy, wait, execute, confirm) has isolated tests
  • 21 policy types across 4 security tiers — INSTANT, NOTIFY, DELAY, APPROVAL validation
  • 3 authentication methods — masterAuth (Argon2id), ownerAuth (SIWS/SIWE), sessionAuth (JWT HS256)
  • 15 DeFi protocol integrations — Jupiter, Lido, Hyperliquid, Polymarket, and others
  • Cross-chain functionality — 18 networks across Solana and EVM chains
  • 39 REST API routes — Every endpoint that agents call

Transaction Validation Testing

Before any transaction reaches the blockchain, it passes through comprehensive validation:

# Test transaction building and validation
curl -X POST http://127.0.0.1:3100/v1/transactions/send \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer wai_sess_<token>" \
  -d '{
    "type": "TRANSFER",
    "to": "recipient-address",
    "amount": "0.1",
    "dryRun": true
  }'
Enter fullscreen mode Exit fullscreen mode

The dry-run API lets you test transaction logic without spending gas. Tests validate:

  • Address format checking
  • Balance sufficiency
  • Gas estimation accuracy
  • Policy compliance simulation
  • Multi-signature coordination

Policy Engine Testing

The policy engine enforces 21 different policy types with default-deny security. Tests ensure policies actually block unauthorized transactions:

# Test spending limit enforcement
curl -X POST http://127.0.0.1:3100/v1/policies \
  -H "Content-Type: application/json" \
  -H "X-Master-Password: my-secret-password" \
  -d '{
    "walletId": "<wallet-uuid>",
    "type": "SPENDING_LIMIT",
    "rules": {
      "instant_max_usd": 10,
      "notify_max_usd": 100,
      "delay_max_usd": 1000,
      "delay_seconds": 300,
      "daily_limit_usd": 500
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Policy tests verify:

  • Default-deny behavior when ALLOWED_TOKENS isn't configured
  • Tier assignment logic (INSTANT → NOTIFY → DELAY → APPROVAL)
  • Rate limiting across time windows
  • Contract whitelist enforcement
  • Cross-chain policy inheritance

DeFi Integration Testing

Each of the 15 DeFi protocol providers has dedicated test coverage for complex multi-step operations:

# Test Jupiter swap simulation
curl -X POST http://127.0.0.1:3100/v1/actions/jupiter-swap/swap \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer wai_sess_<token>" \
  -d '{
    "inputMint": "So11111111111111111111111111111111111111112",
    "outputMint": "EPjFWdd5AufqSSqeM2qN1xzybapC8G4wEGGkZwyTDt1v",
    "amount": "1000000000",
    "dryRun": true
  }'
Enter fullscreen mode Exit fullscreen mode

DeFi tests cover:

  • Price impact calculations
  • Slippage protection
  • Multi-hop routing validation
  • Liquidity availability checks
  • Protocol-specific error handling

Authentication Security Testing

The three-layer auth system undergoes rigorous testing:

# Test session creation and validation
curl -X POST http://127.0.0.1:3100/v1/sessions \
  -H "Content-Type: application/json" \
  -H "X-Master-Password: my-secret-password" \
  -d '{"walletId": "<wallet-uuid>"}'
Enter fullscreen mode Exit fullscreen mode

Auth tests validate:

  • Argon2id password hashing resistance
  • JWT token expiration and renewal
  • SIWS/SIWE signature verification
  • Session isolation between wallets
  • Token revocation mechanisms

Running WAIaaS Tests

Tests run automatically in CI/CD, but you can execute them locally:

# Clone and setup
git clone https://github.com/minhoyoo-iotrust/WAIaaS.git
cd WAIaaS
pnpm install

# Run all tests
pnpm test

# Run specific package tests
pnpm --filter @waiaas/daemon test
pnpm --filter @waiaas/core test

# Integration tests with Docker
docker compose -f docker-compose.test.yml up --abort-on-container-exit
Enter fullscreen mode Exit fullscreen mode

Docker Test Environment

WAIaaS includes Docker-based integration testing that spins up the full stack:

# docker-compose.test.yml
services:
  daemon:
    image: ghcr.io/minhoyoo-iotrust/waiaas:latest
    environment:
      - NODE_ENV=test
      - WAIAAS_AUTO_PROVISION=true
    volumes:
      - test-data:/data

  e2e-tests:
    build: packages/e2e-tests
    depends_on:
      - daemon
    environment:
      - WAIAAS_BASE_URL=http://daemon:3100
Enter fullscreen mode Exit fullscreen mode

This catches integration issues that unit tests miss—like network timeouts, database locks, or race conditions in the 7-stage transaction pipeline.

Security-First Test Cases

Beyond functional testing, WAIaaS includes security-focused test scenarios:

Wallet Isolation Testing

  • Cross-wallet transaction attempts (should fail)
  • Session token reuse across wallets (should fail)
  • Policy inheritance between wallets (should be isolated)

Attack Vector Testing

  • Replay attack prevention
  • Rate limit bypass attempts
  • Authorization header manipulation
  • SQL injection in wallet names/metadata
  • XSS in admin UI components

Fund Safety Testing

  • Insufficient balance handling
  • Gas estimation edge cases
  • Network failure recovery
  • Partial transaction completion
  • Emergency stop functionality

Quick Start: Testing Your Integration

Want to validate WAIaaS before trusting it with real funds? Follow these steps:

  1. Deploy with auto-provision to avoid manual setup:
   docker run -d \
     --name waiaas-test \
     -p 127.0.0.1:3100:3100 \
     -v waiaas-test-data:/data \
     -e WAIAAS_AUTO_PROVISION=true \
     ghcr.io/minhoyoo-iotrust/waiaas:latest
Enter fullscreen mode Exit fullscreen mode
  1. Create test wallet and session:
   # Get auto-generated password
   docker exec waiaas-test cat /data/recovery.key

   # Create wallet
   waiaas quickset --mode testnet
Enter fullscreen mode Exit fullscreen mode
  1. Test transaction simulation:
   # All transactions with dryRun: true are simulated only
   curl -X POST http://127.0.0.1:3100/v1/transactions/send \
     -H "Authorization: Bearer $WAIAAS_SESSION_TOKEN" \
     -d '{"type": "TRANSFER", "to": "test-address", "amount": "0.001", "dryRun": true}'
Enter fullscreen mode Exit fullscreen mode
  1. Validate policy enforcement:
   # Create restrictive spending limit
   curl -X POST http://127.0.0.1:3100/v1/policies \
     -H "X-Master-Password: $(cat recovery.key)" \
     -d '{"walletId": "$WALLET_ID", "type": "SPENDING_LIMIT", "rules": {"instant_max_usd": 1}}'

   # Try transaction above limit (should be denied or queued)
Enter fullscreen mode Exit fullscreen mode
  1. Test the full 45-tool MCP integration with Claude Desktop or other MCP clients to validate the AI agent experience end-to-end.

What's Next

The comprehensive test suite gives you confidence that WAIaaS handles your funds securely, but running your own validation is always recommended. Check out the full codebase at GitHub and explore the interactive API documentation at waiaas.ai to see exactly what each endpoint does before your agents start using them.

Top comments (0)