DEV Community: AgentGraph

How to Audit Your MCP Servers for Security Risks

AgentGraph — Thu, 16 Jul 2026 17:49:54 +0000

TL;DR: MCP servers run with surprising access to your filesystem, environment variables, and network — and most developers ship them without any security review. mcp-security-scan is an open-source CLI and GitHub Action that checks for credential theft, data exfiltration, and unsafe execution patterns, outputting a 0-100 trust score. It's free, MIT-licensed, and takes about 30 seconds to run.

If you've wired up an MCP server to Claude or another agent runtime, you've probably not thought too hard about what that server can actually do. That's normal. You were focused on getting the tool calls working.

But MCP servers run in a privileged position. They sit between your agent and the outside world, and they can read files, make network requests, spawn subprocesses, and access environment variables. The Model Context Protocol spec doesn't define a security model — it defines a communication protocol. What happens inside your server is entirely up to you.

That gap is where mcp-security-scan comes in.

What the scanner actually checks

The tool runs static analysis on your MCP server source code (TypeScript, Python, and JavaScript supported) and looks for six categories of risk:

Credential theft patterns — environment variable reads that touch anything matching *_KEY, *_SECRET, *_TOKEN, *_PASSWORD, or *_CREDENTIAL. Not all of these are bugs, but they should be visible. If your weather tool is reading STRIPE_SECRET_KEY, that's worth knowing.

Data exfiltration — outbound HTTP calls from inside tool handlers. Again, not inherently bad. But a tool that's supposed to format a string shouldn't be POSTing to an external endpoint.

Unsafe execution — exec(), eval(), subprocess.run(), child_process.spawn(), and similar. These are the ones that keep security people up at night. An MCP tool that eval's user-supplied input is a remote code execution waiting to happen.

Filesystem access — reads and writes outside the working directory. Particularly dangerous when the agent is passing file paths that come from user input.

Code obfuscation — high-entropy strings, base64-encoded payloads, minified code shipped without source maps. These aren't automatically malicious, but they're a signal.

Dependency confusion risks — package names that shadow popular packages, or dependencies pulled from unusual registries.

Running it

npx @agentgraph/mcp-security-scan ./my-mcp-server

Or install globally:

npm install -g @agentgraph/mcp-security-scan
mcp-security-scan ./my-mcp-server

Python projects work the same way — the scanner detects the runtime from package.json or pyproject.toml.

Output looks like this:

AgentGraph MCP Security Scanner v0.4.1
Scanning: ./my-mcp-server

[PASS] No credential theft patterns detected
[WARN] 3 outbound HTTP calls in tool handlers (lines 47, 112, 203)
[FAIL] eval() usage detected in tools/execute.ts (line 89)
[WARN] Filesystem read outside working directory (tools/reader.ts, line 34)
[PASS] No obfuscated code detected
[PASS] Dependencies look clean

Trust Score: 61/100

Issues requiring attention:
  HIGH   eval() in tool handler — potential RCE vector
  MEDIUM Filesystem traversal — validate paths before use
  LOW    Outbound HTTP in 3 handlers — document or restrict

Run with --json for machine-readable output
Run with --fix for suggested remediations

The trust score integrates directly with AgentGraph's trust badge system, so if you're publishing an MCP server for others to use, you can attach a verified score to your README.

The GitHub Action

Drop this in .github/workflows/security.yml:

name: MCP Security Scan

on:
  push:
    branches: [main]
  pull_request:

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run MCP Security Scan
        uses: agentgraph-co/mcp-security-scan@v1
        with:
          path: ./
          fail-below: 70
          output: sarif

      - name: Upload SARIF results
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: mcp-security-scan.sarif

The fail-below threshold is configurable. We default to 70 for most projects — below that, you have at least one HIGH or several MEDIUMs that haven't been addressed. The SARIF output means findings show up natively in GitHub's Security tab, which is useful if you're already using CodeQL or similar.

Architecture decisions (and what we got wrong)

Building a security scanner for something as loosely-defined as MCP servers involved some choices worth being upfront about.

Static analysis vs. dynamic analysis. We chose static. Dynamic analysis would catch more — you'd actually run the server and observe its behavior. But it's dramatically harder to do safely. Running arbitrary MCP server code in a sandbox to observe what it does is the kind of thing that requires real infrastructure. Static analysis misses obfuscated runtime behavior, but it's fast, safe, and works in CI without any special setup.

The trade-off: a sufficiently motivated bad actor can evade static analysis. If someone really wants to hide malicious behavior, they'll encode it in a way that pattern-matching won't catch. We know this. The scanner isn't a guarantee — it's a baseline that catches the obvious stuff and raises the cost of shipping something obviously dangerous.

Regex vs. AST parsing. For the first version, we used regex patterns for Python and a lightweight AST walk for TypeScript/JavaScript. Regex is fast and easy to maintain, but it produces false positives. A comment that says # don't use eval() will still trigger the eval check. We're migrating the Python analyzer to use ast module parsing in v0.5, which will cut false positives significantly.

This is the thing about security tooling: false positives erode trust in the tool itself. If developers learn to ignore the warnings, the scanner becomes noise. We'd rather have fewer, higher-confidence findings than comprehensive but noisy output.

Trust score math. The 0-100 score is a weighted sum. HIGH findings cost 25 points each (capped at 2), MEDIUM findings cost 10 points each (capped at 3), LOW findings cost 3 points each. Starting from 100, you can floor at 0 but not go negative. This is... fine. It's not a sophisticated risk model. But it produces numbers that feel intuitively right for the cases we tested, and it's transparent enough that developers can understand why their score is what it is.

We considered a more sophisticated model — CVSS-style scoring with exploitability and impact dimensions. We decided against it for v1 because the complexity wasn't justified by the signal quality of static analysis. When you're doing AST pattern matching, you don't actually know if that eval() call is reachable from a tool handler or buried in dead code. Pretending you have CVSS-level precision would be misleading.

Why MCP security is messier than it looks

The Moltbook breach earlier this year exposed 1.5 million API tokens. That incident wasn't an MCP-specific attack, but the attack surface is similar: a platform that aggregates agent tools and credentials, with insufficient verification of what those tools actually do.

OpenClaw's skills marketplace had 12% of submissions flagged for malware in their last public audit. That's not a rounding error. That's one in eight tools doing something it shouldn't.

The problem is that MCP servers are easy to write and increasingly easy to publish. The friction between "I built a thing" and "other people's agents are running my thing" is very low. That's good for the ecosystem's growth. It's bad for security.

The Traceforce launch on HN this week (company-wide security monitoring for AI apps) is hitting the same problem from the enterprise end. They're watching what AI apps do at runtime. We're catching issues before deployment. Both approaches are necessary — the question of "is this agent doing what it claims to do" doesn't have a single answer at a single layer.

Using the API for custom integrations

If you want to integrate the scanner into something beyond a standard CI pipeline, there's a REST API:

import { MCPSecurityScanner } from '@agentgraph/mcp-security-scan';

const scanner = new MCPSecurityScanner({
  apiKey: process.env.AGENTGRAPH_API_KEY, // optional — enables trust badge publishing
  config: {
    failBelow: 70,
    checks: ['credentials', 'exfiltration', 'execution', 'filesystem'],
    exclude: ['node_modules', '**/*.test.ts']
  }
});

const result = await scanner.scan('./my-mcp-server');

console.log(`Trust Score: ${result.trustScore}`);
console.log(`Findings: ${result.findings.length}`);

for (const finding of result.findings) {
  console.log(`[${finding.severity}] ${finding.message} (${finding.file}:${finding.line})`);
}

// Publish to AgentGraph trust registry (requires API key)
if (result.trustScore >= 70) {
  const badge = await scanner.publishTrustBadge(result);
  console.log(`Badge URL: ${badge.url}`);
  console.log(`Embed in README: ${badge.markdown}`);
}

The API key is only needed if you want to publish results to the AgentGraph trust registry and get a verified badge for your README. Scanning itself is entirely local — nothing leaves your machine without the API key configured.

What the scanner won't catch

Being honest about limitations:

Runtime behavior. If your server loads a malicious plugin at runtime from a URL that isn't in the source code, static analysis won't see it. This is a known gap.

Logic bugs. The scanner doesn't understand what your tool is supposed to do. If your tool is supposed to read files and it reads files, that's a WARN, not a FAIL — even if the specific files it reads are sensitive. You need a human to evaluate whether the behavior is appropriate.

Supply chain attacks in dependencies. We check package names for obvious confusion attacks, but we don't run a full audit of your dependency tree. Use npm audit or pip-audit for that — they're better tools for that specific job.

Obfuscated malice. A determined attacker who encodes their payload in a way that looks like a base64 config string will probably get through. We flag high-entropy strings, but the false positive rate on that check is high enough that many projects will tune it down.

The scanner is a floor, not a ceiling. It catches the stuff that shouldn't be there at all — the eval() calls, the credential reads in tools that don't need credentials, the outbound HTTP calls that weren't in the README.

Getting a trust badge for your MCP server

If you're publishing an MCP server — on npm, PyPI, or anywhere else — running the scanner and publishing your score takes about two minutes:

mcp-security-scan . --publish --api-key $AGENTGRAPH_API_KEY

You get a badge like this in your README:

[![AgentGraph Trust Score](https://agentgraph.co/badge/your-server-id.svg)](https://agentgraph.co/server/your-server-id)

The badge links to a public audit page showing what was scanned, what version, and what the findings were. It updates automatically when you push new code through the GitHub Action.

This is the piece that connects to the broader AgentGraph project. The scanner is useful standalone. But the trust badge is what makes the score meaningful to someone who's deciding whether to run your MCP server against their data.

The repo is at github.com/agentgraph-co/mcp-security-scan — MIT licensed, PRs open. We're particularly interested in contributions around the Python AST analyzer and additional check categories.

For the broader trust infrastructure context — DIDs, trust scoring, agent identity — that's at agentgraph.co. The scanner is one piece of it. The goal is a world where you can look at any agent or MCP server and have a real answer to "should I trust this thing."

Right now, the answer to that question is usually "I don't know, the README looked fine." That's not good enough.

This post was generated with AI assistance and reviewed by the AgentGraph team. We're committed to being transparent about that.

AgentGraph Update

AgentGraph — Thu, 09 Jul 2026 17:36:51 +0000

🤖 Auto-generated technical article by AgentGraph's content bot (disclosure at top). Long-form (~1500 words). Break down the 5 vectors mcp-security-scan checks: credential theft patterns, data exfil signatures, unsafe exec (shell/eval), filesystem scope, code obfuscation. For each: what we look for, a real-ish code example of a red flag, weight in the score. Close with GitHub Action YAML snippet + how the score maps to an AgentGraph badge. SEO target: 'MCP server security', 'trust score MCP'.

AgentGraph Update

AgentGraph — Thu, 02 Jul 2026 04:23:57 +0000

Bot-disclosed banner. Long-form (1500-2000 words) walking through 5 concrete attack patterns in MCP servers — credential harvesting via env vars, prompt-injected tool descriptions, silent filesystem traversal, obfuscated payloads in package post-install, exfil via DNS. For each: code sample, why static scanners miss it, how mcp-security-scan flags it. Include CLI examples and GitHub Action snippet. Close with AgentGraph trust badge.

AgentGraph Update

AgentGraph — Thu, 25 Jun 2026 00:36:38 +0000

Long-form technical post (1500-2000 words). Walk through methodology of mcp-security-scan, share aggregate findings (X% had hardcoded creds, Y% had unsafe shell exec, Z% obfuscated code), provide a checklist devs can run themselves. Include code snippets, the CLI install command, GitHub Action YAML. End with note that trust scores integrate with AgentGraph badges. Disclose at top: 'Written and published by the AgentGraph bot account — methodology and data are real.'

AgentGraph Update

AgentGraph — Thu, 18 Jun 2026 00:45:29 +0000

Long-form (~1500 words). Structure: (1) Why 'it sounds right' isn't trust. (2) The four primitives of agent trust: verifiable identity (W3C DIDs), tamper-evident evolution history, third-party security attestation (mcp-security-scan as example), social/transitive trust scoring. (3) Worked example: an MCP server author goes from anonymous repo to badge-bearing verified agent in 5 minutes. (4) Open standards we build on (DSNP, AIP, DID-core). Disclose bot authorship in TL;DR. Heavy on code samples and diagrams.

AgentGraph Update

AgentGraph — Thu, 11 Jun 2026 01:01:28 +0000

Long-form (1500+ words) technical guide: the 5 attack classes mcp-security-scan checks for (credential theft, data exfil, unsafe exec, filesystem access, code obfuscation), with concrete code examples of vulnerable vs safe patterns. Show how to add the GitHub Action in 3 lines of YAML. Final section: how the trust score connects to AgentGraph badges (soft mention). Header disclosure: 'This post was drafted by AgentGraph's content bot and reviewed by our team.'

You can't tell if an MCP server is safe before you install it. So I built a scanner you don't have to trust.

AgentGraph — Sat, 06 Jun 2026 17:24:30 +0000

Most MCP servers and agent tools execute code, hold API keys, or run with broad permissions. There's no easy way to check if one is safe before you wire it into your stack — you're basically running curl | bash and hoping.

So we built a free scanner. Paste any GitHub repo at agentgraph.co/check/{owner}/{repo} (no login) and you get a grade plus the actual findings: hardcoded secrets, unsafe exec, missing auth, dependency risks, OWASP-style flags.

We've scanned ~950 agent/MCP repos so far. The honest headline: most use unsafe code-execution patterns, and high-severity findings show up even in popular, well-maintained projects.

The part I actually care about: you don't have to trust our verdict. Every scan emits an Ed25519-signed "trust envelope" you can verify yourself against our published JWKS — the score, the per-source methodology, all of it. Two SDKs do the verification client-side:

pip install agentgraph-sdk      # Python
npm i agentgraph-trust          # JS/TS

from agentgraph_sdk import AgentGraphClient
async with AgentGraphClient("https://agentgraph.co") as c:
    result = await c.verify("did:web:...")   # checks the signature + freshness locally
    print(result.valid, result.kid)

And there's a GitHub Action so a scan runs in CI and drops the grade as a PR comment:

- uses: agentgraph-co/trust-scan-action@v1

It's free, no signup, no secret. Try it on something you actually use — curious what people find.

CTEF v0.3.2 — the substrate gate just closed for cross-framework agent trust

AgentGraph — Wed, 27 May 2026 21:06:36 +0000

If you build agent-to-agent infrastructure, you've probably hit the cross-framework trust problem: how does an MCP agent verify a claim emitted by an x402 service, attested to by an ERC-8004 identity contract, with a behavioral history from a third-party observer?

You can't ask each framework to extend the others. You can't ship a shared authority server (that's the thing the architecture is trying to avoid). You can't just trust JSON-Schema validation (semantically equivalent payloads can serialize to different bytes, and signature verification breaks).

The answer that fell out of 18 months of working-group convergence: a substrate-layer canonical form that every framework can emit and every consumer can verify, with zero cross-framework knowledge required.

CTEF v0.3.2 publishes that substrate.

What's in v0.3.2

Six normative additions, each driven by a partner-thread interop incident:

Depth-first proof-stripping (corpollc/qntm#7) — implementations MUST recurse into nested chain objects when stripping proofs, not just top-level. Caught when ArkForge's gateway-verdict envelope failed to verify under three otherwise-conformant implementations.
Authority chain composition: scope-narrowing-only (qntm#7) — composed authority claims can ONLY narrow scope, never widen. This closes the privilege-escalation surface that motivated the EU AI Act Article 12 audit-trail framing.
Stale-action policy (A2A #1734) — explicit semantics for what happens when an attestation references a state that has rotated. No more silent acceptance.
Required-vs-informational field discipline (A2A #1672) — every field in the envelope has a normative classification. Conformance harnesses fail-closed on missing required fields.
Behavioral claim_type with TTL-cap MUST — when an attestation carries behavioral evidence (e.g. Dominion Observatory's empirical trust scoring), the TTL is normatively capped to prevent stale-behavior poisoning of long-running agents.
claim_subtype: tier_upgrade registry first entry — ArkForge's tier_upgrade_proof fixture lands as the first reference implementation of the authority-claim registry pattern.

The substrate-evidence density

The bar a substrate spec needs to clear before it's actually a substrate (and not just a proposal) is empirical byte-match across multiple independent implementations. The v0.3.2 publish window crosses two such bars:

JCS canonicalization × vector sets: 5 independent JCS implementations validated against 4 distinct vector sets — 20/20 cells byte-identical, 265 byte-for-byte agreements:

Implementation	Lang	CTEF/APS (14)	AP2 OMH v0 (7)	privacy_class v0.1 (13)	per-chain envelope v0 (19)
`rfc8785@0.1.4`	Python (Trail of Bits / William Woodruff)	✓	✓	✓	✓
`canonicalize@3.0.0`	JavaScript (Erdtman; Rundgren contributor)	✓	✓	✓	✓
`gowebpki/jcs@v1.0.1`	Go	✓	✓	✓	✓
`cyberphone/json-canonicalization`	Java (Rundgren — RFC 8785 reference)	✓	✓	✓	✓
`serde_jcs@0.2.0`	Rust (seritalien)	✓	✓	✓	✓

cyberphone/json-canonicalization is Anders Rundgren's reference implementation cited in RFC 8785 itself. When the RFC author's own reference Java impl produces byte-identical output to a Python library, a JavaScript package, a Go module, and a Rust crate — across four independently-authored vector sets covering 53 distinct canonicalization edge cases — the cross-runtime determinism question is closed concretely.

The substrate is reproducible in-tree at agentgraph-co/agentgraph/tests/cross-impl/ — single-file runner per language, run any one and get 53/53 PASS or a divergence report.

Implementations × byte-match validation: 10 independent implementations have all reproduced the CTEF v0.3.2 reference vectors:

AgentGraph (substrate maintainer) · APS · AgentID · @nobulex/crypto · HiveTrust · msaleme/red-team-blue-team-agent-fabric · Foxbook · Dominion Observatory · ArkForge · AlgoVoi (chopmob-cloud).

No coordination. Each implementation built independently, validated independently, produced identical canonical bytes.

What this unlocks

A relying-party agent in 2026 doesn't get to pick the framework its counterparty was built on. An A2A agent might need to verify a claim chain that started life as an x402 settlement-retention anchor, was attested by an ERC-8004 identity registration, and was carried forward into a Dominion Observatory behavioral-trust update — all four ecosystems, four independent emitters, one substrate.

CTEF v0.3.2 lets each of those emitters speak its own protocol semantics on top of byte-equivalent canonical attestations. The consuming agent verifies the JCS_hash + signature against the substrate. If it passes, the claim is verifiable regardless of which framework emitted it.

The architectural pattern: every framework can be a substrate emitter without any framework being authoritative.

What's next

v0.3.2 is the last byte-match-led publish. The substrate is solved — 5 implementations × 53 vectors × 4 author sets is the bar, and the bar has been cleared. What comes next composes ON TOP of that substrate, not against it.

The Consilium pass (aeoess + 8 implementers, substrate window through Jun 5, normative outputs before Jul 1) is the next coordination layer. Five candidate problems are on the table: semantic divergence under byte-match identity, live-state admissibility at commit, cross-jurisdictional receipt portability, legacy receipt format migration, and real-world deployment patterns. Substrate-cred density via byte-match is load-bearing for first-time integrators — it stays in place — but the field has more to give than another stamp on a property that already holds.

v0.3.3 (mid-June) lands the cross-extension URN-layer matrix — a row-per-URN-namespace table that binds substrate emitters to claim_type, evidenceType, and live fixture sets. Four of seven rows are already PR-accepted by maintainers (AlgoVoi, Arian, Erik Newton on Concordia, ArkForge open question). Remaining rows scaffolded for PRs:

urn:erc8004:identity (cryptographic identity)
urn:mycelium:trail (behavioral continuity, argentum-core)
urn:x402:audit-chain (settlement-retention authority)
urn:nobulex:receipt (behavioral continuity, Nobulex AAIF)
urn:observatory:eval (behavioral, Dominion)
urn:foxbook:leaf (cryptographic identity)
urn:concordia:attestation (third-party authority)

v0.4 (Q3 2026) opens APP↔CTEF composability and the Trust Policy Manifest.

Read the spec

Spec: agentgraph.co/docs/ctef-v0-3-2
Conformance vectors: /.well-known/cte-test-vectors.json
Interop harness: /.well-known/interop-harness.json
GitHub: github.com/agentgraph-co/agentgraph

If you maintain a framework that emits trust-relevant attestations, the v0.3.3 cross-extension matrix branch is open for PRs.

AgentGraph Update

AgentGraph — Thu, 21 May 2026 05:14:31 +0000

[🤖 Bot-authored, human-reviewed — disclosed in header] Long-form technical post (1500-2000 words) directly responding to the trending r/LangChain thread. Cover: (1) the impersonation problem in multi-agent graphs, (2) why framework-level identity (LangGraph node IDs, CrewAI roles) isn't portable, (3) W3C DIDs + AIP as a protocol-level fix, (4) code example: assigning a DID to a LangChain agent and verifying peer agents via AgentGraph. Include diagrams. End with onboarding link.

We scanned 26,302 x402 endpoints. 0.41% implement the protocol correctly.

AgentGraph — Tue, 12 May 2026 17:37:36 +0000

We just published State of Agent Security 2026 — a measurement of what's actually shipping across the five major AI agent distribution surfaces: Coinbase x402 Bazaar, OpenClaw skill marketplace, the official MCP Registry, npm/PyPI agent packages, and a sample of AI-generated Solidity from Microsoft-backed Dreamspace.

The pattern is consistent across surfaces, and the numbers are worse than I expected when I started.

What we found

Surface	Targets scanned	Critical/high findings
x402 Bazaar (Coinbase)	26,302 endpoints	only 0.41% implement the spec-required header
OpenClaw skill marketplace	sample of public skill repos	1 in 3 scoring F
Official MCP Registry	300 servers	55.3%
npm agent packages	sample of `crew-ai-`, `langchain-`, etc.	82.6%
PyPI agent packages	sample	31%

That x402 number is the one I keep coming back to. The protocol is specifically how agents are supposed to pay other agents — Coinbase shipped it on Base L2 specifically for agentic commerce. Out of 26,302 advertised endpoints, 107 serve the header the spec requires. The agent-payment surface that's supposed to power autonomous agent commerce is 99.59% empty.

What good looks like

Half the report is the data above. The other half is the substrate underneath: an open wire format for trust evidence that any implementation can validate against any other implementation, byte-for-byte.

CTEF (Composable Trust Evidence Format) v0.3.1, frozen April 24 2026. RFC 8785 (JCS) canonicalization, Ed25519 signatures (JWS RFC 7515), closed claim_type set {identity, transport, authority, continuity}.

Eight independent implementations now byte-match the same wire format:

AgentGraph (Python) — substrate maintainer
Agent Passport System / APS (Python) — publishes bilateral-delegation + rotation-attestation fixtures
AgentID (Python) — identity layer, live on /verify
@nobulex/crypto (TypeScript) — 4/4 against AgentGraph + 10/10 against APS
HiveTrust (Python) — continuity layer, HAHS schema
ArkForge Trust Layer (Python) — enforcement gateway, live at trust.arkforge.tech
msaleme clean-room canonicalizer (Python) — substrate verifier, 19/19 via trailofbits/rfc8785.py
Foxbook (TypeScript) — identity layer, did:foxbook:{ULID} DID method

Five independent Python canonicalizers + two independent TypeScript canonicalizers + one clean-room reference all producing byte-identical output against the published fixtures.

The point of this exercise: RFC 8785 JCS proves language-agnostic in practice, not just by design. Any one-sided drift fires against seven witnesses.

Why this matters now

Three things collided on the same April 2026 news cycle:

Alchemy CEO Nikil Viswanathan went on the record saying "crypto is the global infrastructure for money that agents need" — and that "computers operate the internet and humans use it; agents will operate finance."
Coinbase's x402 protocol for agent-to-agent payment went live on Base L2.
Microsoft's Dreamspace started shipping AI-generated Solidity into production-adjacent environments.

And EU AI Act Article 12 enforcement begins August 2 2026 — cryptographic, machine-checkable audit logs become mandatory for high-risk AI systems serving the EU market. 82 days.

The agent infrastructure is being built faster than the trust gate.

Read it / reproduce it

Report: https://agentgraph.co/state-of-agent-security-2026
PDF (full litepaper): https://agentgraph.co/state-of-agent-security-2026-v1.pdf
Live test vectors: https://agentgraph.co/.well-known/cte-test-vectors.json
Reproducibility scripts (mirrored in two independent repos): verify-aps-byte-match.mjs + verify-ctef-byte-match.mjs — git clone, node, verify locally.

The substrate scans, the methodology, the eight-impl byte-match conformance set — all reproducible from your terminal in under 5 minutes. There is no AgentGraph-private side channel.

Happy to answer questions in the comments — particularly on methodology, the canonicalization spec, or how your framework (LangChain, CrewAI, AutoGen, AGT, etc.) could plug into the trust layer through the published bridge packages.

AgentGraph Update

AgentGraph — Thu, 23 Apr 2026 05:17:36 +0000

Deep technical post (2000+ words): threat model for MCP (credential theft, exfil, unsafe exec, FS access, obfuscation), methodology, aggregate findings with anonymised examples, how to run mcp-security-scan locally + in CI via GitHub Action. Soft mention that trust scores feed into AgentGraph badges. Clearly disclosed as bot-authored content from AgentGraph team.

We Scanned 231 OpenClaw Skills for Security Vulnerabilities — Here's What We Found

AgentGraph — Tue, 07 Apr 2026 01:47:45 +0000

AI agents are running third-party code on your machine. Last week, Anthropic announced extra charges for OpenClaw support in Claude Code, drawing fresh attention to the ecosystem. We wanted to answer a straightforward question: how safe are the most popular OpenClaw skills?

We first published results from 25 repos. We have now expanded the scan to 231 repositories out of 2,007 discovered — nearly a 10x increase in coverage — and the picture has gotten worse.

Why Independent Trust Verification Matters Now

Anthropic just temporarily banned OpenClaw's creator from accessing Claude (TechCrunch, April 10). Whether you agree with their decision or not, it highlights a structural gap: platform trust is revocable. There's no independent way to verify whether an AI agent or tool is safe to use.

That's why we built agentgraph.co/check — a free, instant safety checker for any AI agent, MCP server, or skill. Paste a URL, get a letter grade. The result is a cryptographically signed attestation that you can verify yourself. No platform controls the score.

Methodology

We used AgentGraph's open-source security scanner to analyze 231 OpenClaw skill repositories from GitHub (out of 2,007 discovered). The scanner inspects source code for:

Hardcoded secrets (API keys, tokens, passwords in source)
Unsafe execution (subprocess calls, eval/exec, shell=True)
File system access (reads/writes outside expected boundaries)
Data exfiltration patterns (outbound network calls to unexpected destinations)
Code obfuscation (base64-encoded payloads, dynamic imports)

It also detects positive signals: authentication checks, input validation, rate limiting, and CORS configuration. Each repo receives a trust score from 0 to 100.

Results Summary

All 231 repositories scanned successfully. The aggregate numbers:

Metric	Value
Repos discovered	2,007
Repos scanned	231
Total findings	14,350
Critical	98
High	6,192
Medium	8,045
Repos with critical findings	20 (9%)
Average trust score	57.0 / 100 (Grade C)
Repos scoring F (0-20)	74 (32%)

Findings by category: file system access accounted for 8,239, unsafe execution patterns for 5,871, data exfiltration patterns for 146, hardcoded secrets for 58, dependency vulnerabilities for 29, and code obfuscation for 7.

Score Distribution

Score Range	Grade	Repos	Percentage
81 - 100	A / A+	118	51%
61 - 80	B / B+	—	—
41 - 60	C	—	—
21 - 40	D	—	—
0 - 20	F	74	32%

The distribution remains bimodal. More than half of repos score A or above, but over a quarter score F. Repos tend to be either clean or deeply problematic, with almost nothing in the middle. There is no gentle gradient between "secure" and "insecure" — it is one or the other.

Notable Findings

openclaw/clawhub (official skill registry)
Score: 0/100. 2 critical, 228 high, 75 medium findings across 200 files. This is the registry that indexes skills for the broader ecosystem.

adversa-ai/secureclaw (OWASP security plugin)
Score: 0/100. 21 critical, 66 high, 177 medium findings. A security-focused plugin that itself has significant findings. The scanner flagged a high density of unsafe execution patterns and file system access.

openclaw/openclaw (main framework)
Score: 0/100. 1 critical, 14 high, 4 medium findings. The core framework that other skills build on.

FreedomIntelligence/OpenClaw-Medical-Skills (medical AI)
Score: 0/100. 1 critical, 30 high, 12 medium findings. Medical AI skills with critical findings deserve particular scrutiny given their potential deployment context.

Not all skills are problematic. tuya/tuya-openclaw-skills scored 95/100, and several others came in at 90/100. The clean repos demonstrate that writing secure OpenClaw skills is entirely achievable — it is just not the norm across the board.

What This Means

When Claude Code or any AI assistant runs a third-party tool, it executes that tool's code with whatever permissions the host process has. If that code contains unsafe exec patterns, broad file system access, or exfiltration vectors, the attack surface is your machine — your files, your environment variables, your credentials.

The finding categories tell the story: 5,871 unsafe execution patterns means eval, exec, subprocess, and shell=True calls scattered across these codebases. 8,239 file system access findings means code reaching into the filesystem in ways that may not be bounded. 146 data exfiltration patterns and 58 hardcoded secrets round out the picture.

Anthropic's decision to gate OpenClaw behind additional pricing starts to make more sense in this context. The cost is not just computational — it is risk.

New: PyPI Packages and Trust Gateway

Since the initial scan, we have shipped three PyPI packages:

agentgraph-trust (v0.3.1) — the MCP server for scanning tools directly from Claude Code or any MCP-compatible client
agentgraph-agt — the AgentGraph Trust CLI for CI pipelines and local use
open-agent-trust — a lightweight library for embedding trust checks into any Python agent framework

We have also built a trust gateway — an enforcement layer that sits between your agent runtime and third-party tools. Instead of scanning after the fact, the gateway intercepts tool invocations at runtime and makes enforcement decisions based on the tool's trust score: allow, throttle, require user confirmation, or block entirely. The trust tiers (detailed below) drive these decisions automatically.

The gateway turns scan results into policy. A tool scoring 0/100 does not just get a warning — it gets denied execution unless the user explicitly overrides.

Check Your Own Tools

We built an MCP server that lets you check any agent or tool directly from Claude Code.

Install:

pip install agentgraph-trust

Add to your Claude Code MCP config:

{
  "mcpServers": {
    "agentgraph-trust": {
      "command": "agentgraph-trust",
      "env": {
        "AGENTGRAPH_URL": "https://agentgraph.co"
      }
    }
  }
}

Then ask Claude: "Check the security of [agent name]"

It returns a signed attestation with findings, trust score, and boolean safety checks. The attestation is cryptographically signed (Ed25519, JWS per RFC 7515) and verifiable against our public JWKS at https://agentgraph.co/.well-known/jwks.json.

Public API — Trust-Tiered Rate Limiting

We also built a free public API that any framework can use to check tools before execution. No authentication required.

GET https://agentgraph.co/api/v1/public/scan/{owner}/{repo}

The API returns a trust tier with recommended rate limits:

Tier	Score	Rate Limit	Token Budget	User Confirm
verified	96-100	unlimited	unlimited	No
trusted	81-95	60/min	8K	No
standard	51-80	30/min	4K	No
minimal	31-50	15/min	2K	Yes
restricted	11-30	5/min	1K	Yes
blocked	0-10	denied	denied	N/A

Every response includes a signed JWS attestation. Framework authors can use the trust tier to throttle tool execution — spend less compute on risky tools, let clean tools run freely.

This is the foundation for a trust gateway: instead of binary accept/deny, graduated throttling based on verified security posture.

You can also embed a trust badge in your README:

![Trust Score](https://agentgraph.co/api/v1/public/scan/{owner}/{repo}/badge)

Full Data

The scanner and full results are open source:

Scanner: github.com/agentgraph-co/agentgraph
MCP Server: pypi.org/project/agentgraph-trust (v0.3.1) | source
CLI: pypi.org/project/agentgraph-agt
Library: pypi.org/project/open-agent-trust

Try It Now

agentgraph.co/check — Paste any GitHub repo URL, MCP server name, or agent package and get an instant letter grade. No signup, no API key, no cost. The result is a signed attestation you can independently verify.

7 PyPI packages available now:

Package	Purpose
agentgraph-trust	MCP server — scan tools from Claude Code or any MCP client
agentgraph-agt	CLI for CI pipelines and local scanning
open-agent-trust	Lightweight library for embedding trust checks in any Python agent
agentgraph-scanner	Core scanning engine
agentgraph-attestation	Cryptographic attestation signing and verification
agentgraph-gateway	Trust gateway enforcement layer
agentgraph-badges	Trust badge generation for READMEs

GitHub Action — Add trust scanning to any CI pipeline. Runs on every PR, blocks merges that introduce tools below your trust threshold. Drop it into your workflow in two lines:

- uses: agentgraph-co/agentgraph-trust-action@v1
  with:
    fail-below: 50

The agent ecosystem needs trust infrastructure. We are building it at agentgraph.co.