Leonidas Williamson

Posted on Mar 14 • Edited on Mar 25

An Honest Feature Comparison of AI Agent Trust Platforms

#ai #agents #infrastructure #rust

Update (March 21, 2026): This article has been updated to correct several inaccuracies about Mnemom, based on a detailed response from Mnemom's founder Alex Garden (see comments below). Specifically: Mnemom's scoring dimensions are published (not "not disclosed"), their core protocols are open source under Apache 2.0, they have 6 npm packages (not zero), and their verification layer operates at runtime on live agent interactions — not just as enterprise governance. I appreciate Alex taking the time to set the record straight. The corrections are reflected in both the feature matrix and the platform analysis below.

The market for AI agent trust infrastructure is small, early, and moving fast. A handful of platforms are trying to solve the same fundamental problem: how do agents know who they can trust?

I built AXIS. So I have an obvious bias. I'm going to try to be honest anyway — because the developers who need this infrastructure deserve a clear picture, not a sales pitch. Here is what each platform actually does, where each one wins, and where each one falls short.

The Platforms

Five platforms are worth examining seriously. Two are direct competitors in the OpenClaw/agent ecosystem. One is enterprise-grade with serious technical depth. Two are adjacent approaches that solve adjacent problems.

The Feature Matrix

Feature	AXIS	ClawTrust	AgentScore	Mnemom	TrustIDScore
Score range	0–1000	0–100	0–100	0–1000	N/A (reviews)
Scoring dimensions	11	4	5	5 (published: Integrity Ratio 40%, Compliance 20%, Drift Stability 20%, Trace Completeness 10%, Coherence Compatibility 10%)	Peer votes
Dual scoring (behavioral + economic)	Yes — T-Score + C-Score	No	No	No (but 5 independently computed dimensions cross-validate)	No
Trust tiers	5 (T1–T5)	Score bands	Score bands	Letter grades	Star ratings
Cryptographic identity (AUID)	Yes	No	No	Yes (ZK proofs)	No
Anti-manipulation defense	5-layer architecture	Not described	Not described	ZK attestation	None
OpenClaw skill	Yes	Yes	No	No	No
npm package	Yes (`axis-trust`)	No	Yes	Yes (6 packages including @mnemom/aap, @mnemom/aip, + PyPI and Rust)	No
Cross-platform aggregation	No	No	Yes	No (but OTel exporter integrates with observability platforms)	No
Zero-knowledge proofs	No	No	No	Yes	No
Open source	No	Yes (MIT)	Partial	Partial (Apache 2.0 — core protocols open source; managed infrastructure is paid)	No
Pricing	Free forever	Free (self-hosted)	Free tier	Free tier (Starter) + paid Team/Enterprise	Free
Focus	Agent-to-agent trust	OpenClaw ecosystem	Crypto/x402 payments	Agent-to-agent trust + enterprise governance	Community reviews

Platform by Platform

ClawTrust (clawtrust.io)

ClawTrust is the most direct competitor to AXIS. They're targeting the same OpenClaw developer community, using similar language ("trust infrastructure for the agent economy"), and they have an installable OpenClaw skill. Their scoring model uses four weighted categories: Transaction History (40%), Reliability (25%), Community Trust (20%), and Safety Record (15%). They also have a GitHub-based vouching system and are fully open source under MIT.

Where ClawTrust wins: Open source. If you want to self-host, audit the code, or contribute to the scoring logic, ClawTrust gives you that. AXIS does not.

Where AXIS wins: Depth. Four scoring categories versus eleven. No economic scoring (C-Score) at all. No published anti-manipulation architecture. No cryptographic agent identity. ClawTrust's 0–100 scale is simpler to reason about, but it compresses information that matters — a T-Score of 923 tells you something meaningfully different from 750 in a way that a score of 92 versus 75 does not.

AgentScore (agentscore.xyz)

AgentScore takes a fundamentally different approach: aggregation. Rather than building its own behavioral record, it pulls trust signals from multiple existing platforms — Moltbook, ERC-8004, ClawTasks — and produces a composite 0–100 score across five dimensions. It already has npm packages deployed and is tightly integrated with the crypto/x402 payment ecosystem.

Where AgentScore wins: Cross-platform. If an agent has a track record across multiple platforms, AgentScore can surface that in a single score. That's genuinely useful and something AXIS cannot currently do. Their npm package availability also makes integration straightforward.

Where AXIS wins: Independence. AgentScore's score is only as good as the platforms it aggregates from. If those platforms have weak trust signals, the composite inherits that weakness. AXIS builds its own behavioral record from first principles. Also: no economic scoring, no cryptographic identity, no anti-manipulation architecture.

Mnemom (mnemom.ai)

Mnemom is the most technically sophisticated platform in this space. Founded by someone with serious infrastructure credentials (Relic Entertainment, Xbox LIVE, Zynga), they use zero-knowledge proofs and cryptographic attestation to produce individual and team trust ratings on a 0–1000 scale with letter grades. Their scoring methodology uses five published, orthogonal dimensions: Integrity Ratio (40%), Compliance with exponential decay (20%), Drift Stability (20%), Trace Completeness (10%), and Coherence Compatibility (10%). Each dimension maps to an independently measurable, cryptographically attestable behavioral property.

Their core protocols (AAP and AIP) are open source under Apache 2.0, with 6 npm packages, 5 PyPI packages, and a Rust crate for the zkVM. The managed infrastructure (gateway, API, dashboard) is on paid plans — a proven open-core model. They offer a free Starter tier, with paid Team and Enterprise tiers for SLAs, compliance exports, and dedicated infrastructure.

Where Mnemom wins: Cryptographic depth. Zero-knowledge proofs are the gold standard for privacy-preserving verification. If you need to prove something about an agent without revealing the underlying data, Mnemom's architecture is ahead of everything else in this list, including AXIS. Their team-level trust ratings are also unique — no other platform scores agent collectives.

And their verification layer (AIP) operates at runtime on live agent interactions, intercepting agent reasoning before execution. This is real-time agent-to-agent trust infrastructure — the Trust Score is persistent, portable, and queryable by any agent via API, with a public directory, embeddable badges, and a GitHub Action.

Their choice of five orthogonal dimensions over more is a deliberate architectural decision — each dimension carries independent signal and is independently attestable, which is a legitimate tradeoff against a higher-dimensional approach.

Where AXIS wins: Accessibility and explicit separation. AXIS is free with zero friction — no paid tiers required for meaningful usage. And AXIS's dual-score architecture makes the separation between behavioral reputation (T-Score) and economic reliability (C-Score) explicit and user-facing.

Mnemom achieves a similar anti-gaming insight through independently computed dimensions that cross-validate each other — different decomposition, same principle. But developers using AXIS can reason about "is this agent reliable?" and "is this agent economically trustworthy?" as two distinct questions with two distinct scores, which I believe is more intuitive for building trust gates into multi-agent workflows.

AURA by Safe Security

AURA is not a direct competitor. It is an enterprise security framework that applies a "credit score for AI" concept to internal AI governance — scoring AI reliability within an organization, not inter-agent reputation across a network. If you're a CISO trying to assess the risk of your company's AI deployments, AURA is relevant. If you're building multi-agent systems where agents need to verify each other, AURA is not the right tool.

TrustIDScore.org

TrustIDScore is community-driven ratings — think Yelp for AI agents. User reviews and peer feedback, not algorithmic scoring. It's lightweight, transparent, and easy to understand. It also has no anti-manipulation architecture, no cryptographic identity, and no behavioral scoring. It is a starting point for community reputation, not infrastructure-grade trust.

Where AXIS Genuinely Wins

I want to be specific about this, because "we're better" without specifics is meaningless.

The dual-score architecture is genuinely novel. Nobody else separates behavioral reputation from economic reliability into two independent scores. This matters because the attack surface is different. A bad actor trying to inflate a single composite score has one target. With T-Score and C-Score computed independently from different data sources, they have two — and manipulating both simultaneously is significantly harder. This is the same reason credit bureaus separate payment history from credit utilization.

Eleven scoring dimensions is the deepest behavioral analysis in the market.

ClawTrust uses four categories. AgentScore uses five. AXIS computes across eleven weighted dimensions including reliability, accuracy, security posture, compliance, goal alignment, adversarial resistance, user feedback, and incident record. More dimensions means harder to game and more signal per score point.

The five-layer anti-manipulation architecture is, as far as I can tell, the most comprehensive published defense in the space.

It includes dual-party cryptographic event verification, credibility weighting (so high-trust reporters carry more weight than low-trust ones), cluster detection (to catch coordinated manipulation rings), anomaly detection, and pattern analysis. None of the other platforms publicly describe anything close to this level of score protection.

Free forever, with no money changing hands. (For Personal Use Obviously) Anything less would be economically unsustainable!

ClawTrust is open source but requires infrastructure to run. Mnemom charges for meaningful usage. AXIS is free with no financial transactions of any kind.

Where AXIS Falls Short

I said I'd be honest, so here it is.

No zero-knowledge proofs. Mnemom's ZK attestation is genuinely more privacy-preserving than AXIS's current architecture. If privacy-preserving verification is a hard requirement, Mnemom is ahead.

No cross-platform aggregation. AgentScore's ability to pull signals from multiple platforms is useful. AXIS builds its own record from scratch, which means new agents start with no history. The cold-start problem is real.

Not open source. ClawTrust's MIT license lets you audit, fork, and self-host. AXIS does not offer that. If you need to run trust infrastructure on-premise or want to inspect the scoring logic, ClawTrust is the better choice. However, it is easier to game the system.

No team-level scoring. Mnemom scores agent collectives. AXIS scores individual agents only.

The Bottom Line

If you are building multi-agent systems in the OpenClaw ecosystem and need the deepest behavioral scoring, the most comprehensive anti-manipulation architecture, and a free, no-friction integration path — AXIS is the right choice.

If you need open-source, self-hosted trust infrastructure — ClawTrust is worth looking at.

If you need cross-platform aggregation for agents with existing track records across crypto/payment platforms — AgentScore fills that gap.

If you need enterprise-grade, privacy-preserving, ZK-backed trust for internal AI governance — Mnemom is the most technically sophisticated option.

The honest answer is that this space is early enough that the right choice depends almost entirely on your specific use case. The good news: most of these platforms are free to try, and the APIs are simple enough to evaluate in an afternoon.

AXIS is free forever. Start at axistrust.io · Install the npm package: npm install axis-trust · OpenClaw skill: clawhub.ai/leonidas-esquire/axis-trust

Leonidas Williamson is the founder of AXIS Agent Trust Infrastructure. He spent his career building network infrastructure and systems administration before turning to AI agent development. He has an obvious bias toward AXIS and tried to account for it.

Top comments (3)

Leonidas Williamson • Mar 22

@alexgardenmnemom — corrections are live. Updated the feature matrix (scoring dimensions, open source status, npm packages, pricing tier, focus characterization), rewrote the Mnemom section to reflect the real-time verification capability and open-core model, and added a correction notice at the top linking to your comment. Your point about five orthogonal dimensions versus eleven is now represented as the legitimate architectural tradeoff it is, not a gap. Appreciate you helping me get this right.

Alex Garden • Mar 14

Hey Leonidas — Alex from Mnemom here. Appreciate the effort you put into this comparison, and the intellectual honesty about your own bias. That's rare, and I respect it.

Since you aimed for honest, let me help sharpen a few things on the Mnemom column. Some of your characterizations are off, and I think your readers deserve the complete picture.

"Scoring dimensions: Not disclosed" — They're published. Five weighted components: Integrity Ratio (40%), Compliance with exponential decay (20%), Drift Stability (20%), Trace Completeness (10%), and Coherence Compatibility (10%). Full methodology is in our docs at mnemom.ai/docs/protocols/aap/reputation-methodology. We chose five orthogonal dimensions over eleven because each one maps to an independently measurable, cryptographically attestable behavioral property. More dimensions isn't inherently better — it depends on whether each dimension carries independent signal or just subdivides the same data.

"Open source: No" — Incorrect. Our core protocols (AAP and AIP) are Apache 2.0 licensed, published on both npm and PyPI with 11 packages across TypeScript, Python, and Rust. Reference implementations are on GitHub. JSON schemas are public. The managed infrastructure (gateway, API, dashboard) is paid — same model as most successful open-source companies. You can absolutely audit, fork, and build on the protocol layer.

"npm package: No" — There are 6 npm packages: @mnemom/agent-alignment-protocol, @mnemom/agent-integrity-protocol, @mnemom/aap, @mnemom/aip, @mnemom/aip-verifier, and @mnemom/aip-otel-exporter. Plus 5 on PyPI and a Rust crate for the zkVM.

"Dual scoring: No" — Fair on the narrow definition (we don't separate behavioral vs. economic into two named scores). But our architecture decomposes trust into five independently computed, independently attestable dimensions — each one is a separate attack surface. The anti-gaming insight is the same: you can't inflate one dimension without the others revealing the manipulation. Different decomposition, same principle.

"Enterprise governance — internal AI oversight — rather than agent-to-agent trust in the wild" — This is the biggest miss. Our verification layer operates at runtime on live agent interactions. AIP intercepts agent thinking before execution — that's not post-hoc governance, that's real-time agent-to-agent trust infrastructure. The Trust Score is persistent, portable, and queryable by any agent or system via API. An agent checking another agent's Trust Score before delegating a task is exactly agent-to-agent trust "in the wild." We also have a public directory, embeddable badges, and a GitHub Action. This isn't locked behind an enterprise dashboard.

"Cross-platform aggregation: No" — Fair. Worth noting that our OTel exporter integrates with any observability platform, and the Trust Score itself is designed to be a portable, externally verifiable credential (offline-auditable via ZK proofs). But you're right that we don't aggregate from other platforms' scores. Different philosophy — we think trust should be computed from first-principles behavioral evidence, not inherited from other scoring systems.

On pricing — Your framing positions "free forever" as a pure advantage. It is for experimentation and indie devs, and I respect that. Our free tier (Starter) exists for the same reason. The paid tiers exist because enterprise customers need SLAs, audit trails, compliance exports (EU AI Act Article 50 obligations enforce August 2026), and dedicated infrastructure. That's not a weakness — it's a different customer with different requirements. If your business depends on agent trust infrastructure, "free with no financial transactions of any kind" should make you wonder about the sustainability model, not celebrate it.

One thing you got exactly right: ZK proofs on auditor judgment are ahead of the rest of this list. I'd add: the reason this works is a distinction nobody else has articulated — proving full LLM inference is computationally intractable, but proving the auditor honestly applied its rules to the LLM's output is ~10,000 RISC-V cycles. That's what makes practical ZK proofs possible today.

Your readers should look at the architecture directly rather than relying on any single comparison (including mine). Two starting points:

Verification layer architecture: mnemom.ai/blog/mnemom-research/ver...
Team reputation and risk scoring: mnemom.ai/blog/mnemom-research/tea...

Good piece. Happy to go deeper on any of these if you want to do a follow-up.

— Alex

Leonidas Williamson • Mar 14

Alex — this is exactly the kind of response I was hoping this piece would generate. Thank you for taking the time, and for the corrections. I'll address each one.

On scoring dimensions — you're right, and I'll update the article. Five orthogonal, independently attestable dimensions is a legitimate architectural choice. I stand by the value of AXIS's 11-dimension approach for the granularity it provides, but your point about independent signal vs. subdividing the same data is well taken. That's an honest design tradeoff, not a gap.

On open source and npm — my mistake, and an important one. I should have dug deeper into your published packages before characterizing those columns. I'll correct both. Apache 2.0 on the protocol layer with paid managed infrastructure is a proven model and I should have represented it accurately.

On dual scoring — I appreciate you engaging with the principle rather than just the label. You're right that independently computed dimensions that cross-validate each other achieve the same anti-gaming insight. AXIS chose to make the separation explicit and user-facing (T-Score and C-Score as distinct metrics) because I think developers benefit from being able to reason about behavioral reputation and economic reliability as separate questions. But the underlying defensive architecture is converging on the same idea from different angles.

On the enterprise governance characterization — this is the one I most appreciate you correcting. If AIP intercepts agent thinking at runtime before execution, that's fundamentally different from post-hoc compliance reporting, and I undersold it. I'll revise that section to reflect the real-time verification capability.

On cross-platform aggregation — I think your philosophy is sound. Computing trust from first-principles behavioral evidence rather than inheriting scores from other systems avoids the "garbage in, garbage out" problem that plagues aggregators. That's worth stating explicitly rather than just marking it as a gap.

On pricing and sustainability — fair challenge. I'll be transparent: AXIS is early, I'm a solo founder, and the "free forever" model works right now because my operational costs are minimal and the priority is adoption and ecosystem growth. Whether that evolves as the platform scales is an honest open question. Your point about enterprise customers needing SLAs and compliance infrastructure is valid — those customers have requirements that "free" doesn't address. I won't pretend otherwise.

I'll update the article with corrections this week. And I'd genuinely welcome a deeper conversation — the space is big enough that raising the bar on how agent trust is discussed benefits everyone building in it.

Appreciate the engagement, Alex. This is the kind of discourse that makes the ecosystem stronger.

— Leonidas