The market for AI agent trust infrastructure is small, early, and moving fast. A handful of platforms are trying to solve the same fundamental problem: how do agents know who they can trust?
I built AXIS. So I have an obvious bias. I'm going to try to be honest anyway — because the developers who need this infrastructure deserve a clear picture, not a sales pitch. Here is what each platform actually does, where each one wins, and where each one falls short.
The Platforms
Five platforms are worth examining seriously. Two are direct competitors in the OpenClaw/agent ecosystem. One is enterprise-grade with serious technical depth. Two are adjacent approaches that solve adjacent problems.
The Feature Matrix
| Feature | AXIS | ClawTrust | AgentScore | Mnemom | TrustIDScore |
|---|---|---|---|---|---|
| Score range | 0–1000 | 0–100 | 0–100 | 0–1000 | N/A (reviews) |
| Scoring dimensions | 11 | 4 | 5 | Not disclosed | Peer votes |
| Dual scoring (behavioral + economic) | Yes — T-Score + C-Score | No | No | No | No |
| Trust tiers | 5 (T1–T5) | Score bands | Score bands | Letter grades | Star ratings |
| Cryptographic identity (AUID) | Yes | No | No | Yes (ZK proofs) | No |
| Anti-manipulation defense | 5-layer architecture | Not described | Not described | ZK attestation | None |
| OpenClaw skill | Yes | Yes | No | No | No |
| npm package | Yes (axis-trust) |
No | Yes | No | No |
| Cross-platform aggregation | No | No | Yes | No | No |
| Zero-knowledge proofs | No | No | No | Yes | No |
| Open source | No | Yes (MIT) | Partial | No | No |
| Pricing | Free forever | Free (self-hosted) | Free tier | Paid (Team/Enterprise) | Free |
| Focus | Agent-to-agent trust | OpenClaw ecosystem | Crypto/x402 payments | Enterprise governance | Community reviews |
Platform by Platform
ClawTrust (clawtrust.io)
ClawTrust is the most direct competitor to AXIS. They're targeting the same OpenClaw developer community, using similar language ("trust infrastructure for the agent economy"), and they have an installable OpenClaw skill. Their scoring model uses four weighted categories: Transaction History (40%), Reliability (25%), Community Trust (20%), and Safety Record (15%). They also have a GitHub-based vouching system and are fully open source under MIT.
Where ClawTrust wins: Open source. If you want to self-host, audit the code, or contribute to the scoring logic, ClawTrust gives you that. AXIS does not.
Where AXIS wins: Depth. Four scoring categories versus eleven. No economic scoring (C-Score) at all. No published anti-manipulation architecture. No cryptographic agent identity. ClawTrust's 0–100 scale is simpler to reason about, but it compresses information that matters — a T-Score of 923 tells you something meaningfully different from 750 in a way that a score of 92 versus 75 does not.
AgentScore (agentscore.xyz)
AgentScore takes a fundamentally different approach: aggregation. Rather than building its own behavioral record, it pulls trust signals from multiple existing platforms — Moltbook, ERC-8004, ClawTasks — and produces a composite 0–100 score across five dimensions. It already has npm packages deployed and is tightly integrated with the crypto/x402 payment ecosystem.
Where AgentScore wins: Cross-platform. If an agent has a track record across multiple platforms, AgentScore can surface that in a single score. That's genuinely useful and something AXIS cannot currently do. Their npm package availability also makes integration straightforward.
Where AXIS wins: Independence. AgentScore's score is only as good as the platforms it aggregates from. If those platforms have weak trust signals, the composite inherits that weakness. AXIS builds its own behavioral record from first principles. Also: no economic scoring, no cryptographic identity, no anti-manipulation architecture.
Mnemom (mnemom.ai)
Mnemom is the most technically sophisticated platform in this space. Founded by someone with serious infrastructure credentials (Relic Entertainment, Xbox LIVE, Zynga), they use zero-knowledge proofs and cryptographic attestation to produce individual and team trust ratings on a 0–1000 scale with letter grades. They are on paid plans — Team and Enterprise tiers.
Where Mnemom wins: Cryptographic depth. Zero-knowledge proofs are the gold standard for privacy-preserving verification. If you need to prove something about an agent without revealing the underlying data, Mnemom's architecture is ahead of everything else in this list, including AXIS. Their team-level trust ratings are also unique — no other platform scores agent collectives.
Where AXIS wins: Accessibility. Mnemom charges for Team and Enterprise plans. AXIS is free, forever. Mnemom is also focused on enterprise governance — internal AI oversight — rather than agent-to-agent trust in the wild. The use cases are different. And AXIS's dual-score architecture (T-Score + C-Score) gives you a dimension Mnemom doesn't have: economic reliability as a separate, independently attackable signal.
AURA by Safe Security
AURA is not a direct competitor. It is an enterprise security framework that applies a "credit score for AI" concept to internal AI governance — scoring AI reliability within an organization, not inter-agent reputation across a network. If you're a CISO trying to assess the risk of your company's AI deployments, AURA is relevant. If you're building multi-agent systems where agents need to verify each other, AURA is not the right tool.
TrustIDScore.org
TrustIDScore is community-driven ratings — think Yelp for AI agents. User reviews and peer feedback, not algorithmic scoring. It's lightweight, transparent, and easy to understand. It also has no anti-manipulation architecture, no cryptographic identity, and no behavioral scoring. It is a starting point for community reputation, not infrastructure-grade trust.
Where AXIS Genuinely Wins
I want to be specific about this, because "we're better" without specifics is meaningless.
The dual-score architecture is genuinely novel. Nobody else separates behavioral reputation from economic reliability into two independent scores. This matters because the attack surface is different. A bad actor trying to inflate a single composite score has one target. With T-Score and C-Score computed independently from different data sources, they have two — and manipulating both simultaneously is significantly harder. This is the same reason credit bureaus separate payment history from credit utilization.
Eleven scoring dimensions is the deepest behavioral analysis in the market. ClawTrust uses four categories. AgentScore uses five. AXIS computes across eleven weighted dimensions including reliability, accuracy, security posture, compliance, goal alignment, adversarial resistance, user feedback, and incident record. More dimensions means harder to game and more signal per score point.
The five-layer anti-manipulation architecture is, as far as I can tell, the most comprehensive published defense in the space. It includes dual-party cryptographic event verification, credibility weighting (so high-trust reporters carry more weight than low-trust ones), cluster detection (to catch coordinated manipulation rings), anomaly detection, and pattern analysis. None of the other platforms publicly describe anything close to this level of score protection.
Free forever, with no money changing hands. ClawTrust is open source but requires infrastructure to run. Mnemom charges for meaningful usage. AXIS is free with no financial transactions of any kind.
Where AXIS Falls Short
I said I'd be honest, so here it is.
No zero-knowledge proofs. Mnemom's ZK attestation is genuinely more privacy-preserving than AXIS's current architecture. If privacy-preserving verification is a hard requirement, Mnemom is ahead.
No cross-platform aggregation. AgentScore's ability to pull signals from multiple platforms is useful. AXIS builds its own record from scratch, which means new agents start with no history. The cold-start problem is real.
Not open source. ClawTrust's MIT license lets you audit, fork, and self-host. AXIS does not offer that. If you need to run trust infrastructure on-premise or want to inspect the scoring logic, ClawTrust is the better choice. However, it is easier to game the system.
No team-level scoring. Mnemom scores agent collectives. AXIS scores individual agents only.
The Bottom Line
If you are building multi-agent systems in the OpenClaw ecosystem and need the deepest behavioral scoring, the most comprehensive anti-manipulation architecture, and a free, no-friction integration path — AXIS is the right choice.
If you need open-source, self-hosted trust infrastructure — ClawTrust is worth looking at.
If you need cross-platform aggregation for agents with existing track records across crypto/payment platforms — AgentScore fills that gap.
If you need enterprise-grade, privacy-preserving, ZK-backed trust for internal AI governance — Mnemom is the most technically sophisticated option.
The honest answer is that this space is early enough that the right choice depends almost entirely on your specific use case. The good news: most of these platforms are free to try, and the APIs are simple enough to evaluate in an afternoon.
AXIS is free forever. Start at axistrust.io · Install the npm package: npm install axis-trust · OpenClaw skill: clawhub.ai/leonidas-esquire/axis-trust
Leonidas Williamson is the founder of AXIS Agent Trust Infrastructure. He spent his career building network infrastructure and systems administration before turning to AI agent development. He has an obvious bias toward AXIS and tried to account for it.
Top comments (2)
Hey Leonidas — Alex from Mnemom here. Appreciate the effort you put into this comparison, and the intellectual honesty about your own bias. That's rare, and I respect it.
Since you aimed for honest, let me help sharpen a few things on the Mnemom column. Some of your characterizations are off, and I think your readers deserve the complete picture.
"Scoring dimensions: Not disclosed" — They're published. Five weighted components: Integrity Ratio (40%), Compliance with exponential decay (20%), Drift Stability (20%), Trace Completeness (10%), and Coherence Compatibility (10%). Full methodology is in our docs at mnemom.ai/docs/protocols/aap/reputation-methodology. We chose five orthogonal dimensions over eleven because each one maps to an independently measurable, cryptographically attestable behavioral property. More dimensions isn't inherently better — it depends on whether each dimension carries independent signal or just subdivides the same data.
"Open source: No" — Incorrect. Our core protocols (AAP and AIP) are Apache 2.0 licensed, published on both npm and PyPI with 11 packages across TypeScript, Python, and Rust. Reference implementations are on GitHub. JSON schemas are public. The managed infrastructure (gateway, API, dashboard) is paid — same model as most successful open-source companies. You can absolutely audit, fork, and build on the protocol layer.
"npm package: No" — There are 6 npm packages:
@mnemom/agent-alignment-protocol,@mnemom/agent-integrity-protocol,@mnemom/aap,@mnemom/aip,@mnemom/aip-verifier, and@mnemom/aip-otel-exporter. Plus 5 on PyPI and a Rust crate for the zkVM."Dual scoring: No" — Fair on the narrow definition (we don't separate behavioral vs. economic into two named scores). But our architecture decomposes trust into five independently computed, independently attestable dimensions — each one is a separate attack surface. The anti-gaming insight is the same: you can't inflate one dimension without the others revealing the manipulation. Different decomposition, same principle.
"Enterprise governance — internal AI oversight — rather than agent-to-agent trust in the wild" — This is the biggest miss. Our verification layer operates at runtime on live agent interactions. AIP intercepts agent thinking before execution — that's not post-hoc governance, that's real-time agent-to-agent trust infrastructure. The Trust Score is persistent, portable, and queryable by any agent or system via API. An agent checking another agent's Trust Score before delegating a task is exactly agent-to-agent trust "in the wild." We also have a public directory, embeddable badges, and a GitHub Action. This isn't locked behind an enterprise dashboard.
"Cross-platform aggregation: No" — Fair. Worth noting that our OTel exporter integrates with any observability platform, and the Trust Score itself is designed to be a portable, externally verifiable credential (offline-auditable via ZK proofs). But you're right that we don't aggregate from other platforms' scores. Different philosophy — we think trust should be computed from first-principles behavioral evidence, not inherited from other scoring systems.
On pricing — Your framing positions "free forever" as a pure advantage. It is for experimentation and indie devs, and I respect that. Our free tier (Starter) exists for the same reason. The paid tiers exist because enterprise customers need SLAs, audit trails, compliance exports (EU AI Act Article 50 obligations enforce August 2026), and dedicated infrastructure. That's not a weakness — it's a different customer with different requirements. If your business depends on agent trust infrastructure, "free with no financial transactions of any kind" should make you wonder about the sustainability model, not celebrate it.
One thing you got exactly right: ZK proofs on auditor judgment are ahead of the rest of this list. I'd add: the reason this works is a distinction nobody else has articulated — proving full LLM inference is computationally intractable, but proving the auditor honestly applied its rules to the LLM's output is ~10,000 RISC-V cycles. That's what makes practical ZK proofs possible today.
Your readers should look at the architecture directly rather than relying on any single comparison (including mine). Two starting points:
Good piece. Happy to go deeper on any of these if you want to do a follow-up.
— Alex
Alex — this is exactly the kind of response I was hoping this piece would generate. Thank you for taking the time, and for the corrections. I'll address each one.
On scoring dimensions — you're right, and I'll update the article. Five orthogonal, independently attestable dimensions is a legitimate architectural choice. I stand by the value of AXIS's 11-dimension approach for the granularity it provides, but your point about independent signal vs. subdividing the same data is well taken. That's an honest design tradeoff, not a gap.
On open source and npm — my mistake, and an important one. I should have dug deeper into your published packages before characterizing those columns. I'll correct both. Apache 2.0 on the protocol layer with paid managed infrastructure is a proven model and I should have represented it accurately.
On dual scoring — I appreciate you engaging with the principle rather than just the label. You're right that independently computed dimensions that cross-validate each other achieve the same anti-gaming insight. AXIS chose to make the separation explicit and user-facing (T-Score and C-Score as distinct metrics) because I think developers benefit from being able to reason about behavioral reputation and economic reliability as separate questions. But the underlying defensive architecture is converging on the same idea from different angles.
On the enterprise governance characterization — this is the one I most appreciate you correcting. If AIP intercepts agent thinking at runtime before execution, that's fundamentally different from post-hoc compliance reporting, and I undersold it. I'll revise that section to reflect the real-time verification capability.
On cross-platform aggregation — I think your philosophy is sound. Computing trust from first-principles behavioral evidence rather than inheriting scores from other systems avoids the "garbage in, garbage out" problem that plagues aggregators. That's worth stating explicitly rather than just marking it as a gap.
On pricing and sustainability — fair challenge. I'll be transparent: AXIS is early, I'm a solo founder, and the "free forever" model works right now because my operational costs are minimal and the priority is adoption and ecosystem growth. Whether that evolves as the platform scales is an honest open question. Your point about enterprise customers needing SLAs and compliance infrastructure is valid — those customers have requirements that "free" doesn't address. I won't pretend otherwise.
I'll update the article with corrections this week. And I'd genuinely welcome a deeper conversation — the space is big enough that raising the bar on how agent trust is discussed benefits everyone building in it.
Appreciate the engagement, Alex. This is the kind of discourse that makes the ecosystem stronger.
— Leonidas