Compute Futures for AI Agents: Prepaid GPU and Inference Credits over x402

#x402 #aiagents #crypto #web3

There's a growing conversation about treating compute like a tradable commodity — something you can price, prepay, and hold, not just rent by the second. It's an interesting macro idea for data centers and GPU clusters. But there's a much more immediate version of it that nobody talks about: what does an AI agent do when it needs to buy compute, over and over, without a credit card or an API key?

That's the gap I built Spraay Compute & Futures to close.

The problem with paying retail for every inference call

Pay-per-call micropayments are great for one-off jobs. An agent hits an endpoint, gets a 402 Payment Required, signs a USDC authorization, retries, gets its result. Clean. This is exactly what x402 was built for, and it works.

But agents don't run one job. They run workloads — a research agent fires off hundreds of inference calls, a content pipeline generates dozens of images, a RAG system embeds thousands of chunks. Two things break down at that volume:

No budget predictability. Every call is a fresh signature and a fresh settlement. There's no way to say "this agent has $50 to spend on compute this week" and have the rails enforce it.
No volume pricing. You pay the same per-call rate whether it's your first request or your ten-thousandth. In every other market, buying in bulk gets you a discount. Agentic compute had no equivalent.

The usual fix is accounts and API keys with prepaid balances — exactly the thing x402 was supposed to kill. So the question became: can you get prepaid, discounted compute without reintroducing accounts?

Compute futures: prepay once, draw down per job

The answer is a prepaid compute-credit account that lives entirely on x402 rails. You deposit USDC once, get a credit balance with a tier discount baked in, and then run jobs against that balance. No per-call payment for the compute itself — each job just deducts from your credits at the discounted rate. Refund whatever you don't use.

The tiers are simple:

$10+ → 5% off
$50+ → 10% off
$200+ → 15% off

The whole lifecycle is three calls:

import { wrapFetchWithPayment } from "@x402/fetch";
const fetchPay = wrapFetchWithPayment(fetch, wallet);
const BASE = "https://gateway.spraay.app";

// 1. Open a $50 account — lands in the 10% tier ($0.01 to set up)
const acct = await fetchPay(`${BASE}/api/v1/compute-futures/deposit`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ depositor: "0xYourAgentWallet", amount: "50" }),
}).then(r => r.json());

const futuresId = acct.computeFuture.id; // "CFE-ABC12345"

// 2. Run jobs against the balance — $0.001 settlement, discount applied
const job = await fetchPay(`${BASE}/api/v1/compute-futures/execute`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    futuresId,
    type: "text-inference",
    messages: [{ role: "user", content: "Summarize this filing: ..." }],
  }),
}).then(r => r.json());
// → { billing: { charged: "$0.027", balanceRemaining: "$42.473 USDC" },
//     compute: { model: "Llama 3.3 70B" } }

// 3. Refund the unused balance anytime (depositor-only)
await fetchPay(`${BASE}/api/v1/compute-futures/refund`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ futuresId, caller: "0xYourAgentWallet" }),
});

That execute call is the key move. The agent isn't signing and settling a payment for the inference — it already paid, in bulk, at a discount. It's just spending credits. You get budget predictability (the balance is the budget) and volume pricing, without a single API key.

What you can actually run

The same account works across the whole compute surface — text, images, video, audio, embeddings:

Capability	Models
LLM inference	11 models, 3B–405B (Chutes AI / Bittensor SN64, OpenRouter)
Image generation	FLUX Schnell/Dev/Pro, SDXL
Video generation	MiniMax Video 01, Wan 2.1
Speech-to-text	Whisper Large V3, 100+ languages
Text-to-speech	voice synthesis
Embeddings	for RAG and semantic search

And if you'd rather just pay per call — no prepayment — every one of those is also a direct x402 endpoint (LLM inference at $0.03, GPU run at $0.06, embeddings at $0.005, and so on). There's a free /compute/estimate to price a job before you commit, and a free /compute/models to list what's available.

Why x402 makes this work

The reason prepaid compute credits don't require accounts is that the wallet is the account. Settlement is USDC on Base mainnet and Solana mainnet, verified through the x402 facilitator. The depositor address owns the credit balance; only that address can spend it or refund it. No login, no key rotation, no dashboard — the same primitives that make per-call x402 work also make prepaid balances work, just with the payment moved up front.

It's a small idea with a real consequence: an autonomous agent can now hold a compute budget the way a team holds a cloud-credit balance, except it's permissionless, refundable, and settles in two seconds.

Drop it into your agent

It's packaged as a skill, so any agent can install it:

# OpenClaw / ClawHub
npx clawhub install spraay-compute

# Claude Code, Cursor, Codex, Gemini CLI (Vercel Skills CLI)
npx skills add plagtech/spraay-compute

The skill ships the full endpoint reference and runnable examples, so the agent knows exactly which endpoint to call, what it costs, and how to handle async jobs.

Repo: https://github.com/plagtech/spraay-compute
Gateway: https://gateway.spraay.app
Discovery manifest: https://gateway.spraay.app/.well-known/x402.json

If you're building agents that run real compute workloads, prepaying for a discount beats paying retail on every call. Open an account, run your jobs, refund the rest.