Tessl for Tessl

Posted on May 15 • Edited on May 19

Stop trusting your agent skills with vibes. Eliminate the context security risk.

#ai #agents #security #productivity

Plugin quality and performance uplift metrics

When you install an npm package, you can run npm audit. When you install a Python package, there's pip-audit. But when you install plugins that give your AI agent new skills and rules, you know, things that directly shape how it reasons and what it does, what do you run?

If your answer is "nothing", you're not alone, and that's why I built tessl-audit! You can check it out on GitHub and npm.

Why this matters more than you think

Agent plugins are instructions that get loaded into your AI agent's context. A plugin with a security issue doesn't just expose a server endpoint. It can influence the agent's behaviour in ways that are subtle and hard to detect, perhaps nudging it toward unsafe patterns, exposing data it shouldn't, or simply making it worse at its job.

Ask yourself these three questions about your agent skills, and if the answer to any of them is no, you’re seconds away from being able to say yes, with tessl-audit.

Have all your skills been security scanned? If so, what was the result?
Can you prove your skills are any good? Quality scores tell you how well-written and complete a plugin is. A low score means the agent is getting poor guidance.
Do your skills and plugins actually help? Uplift scores measure whether a plugin improves agent task performance compared to a vanilla agent alone.

Join us at AI Native DevCon (use C0DE30 for 30% discount)

Why not try it right now?

It’s a free open source tool that uses Tessl under the covers. If you have a Tessl project with plugins installed, just run this in your project root:

npx tessl-audit

Wait, is that it? Absolutely, that's it. It reads your tessl.json, fetches live data from the registry for every plugin, and prints a report in about 30 seconds.

The script begins by looking through all your context file that it finds in the tessl.json manifest file. This should complete pretty quickly and you’ll soon see the table below, with a breakdown of your project context., and the types of warnings that have been picked up.

Next, the tool gives a posture summary of all of your context, giving more details of the riskiest skills in your project and what the issues are.

You can click through on any of these links to see the actual issues in the registry web UI.

And finally, the tool provides next step actions of the CLI commands to use (you can use an agent to call these also) to optimize, create and run evals on your skills.

The "so what" for each finding

Advisory, Risky, or Critical security status?

The report prints each flagged plugin with its warning codes and a direct link to the full security report on the registry. No need to chase them down, the security posture report lets you see the full summary in one listing, allowing you to deep dive here needed. Just open the link, read the finding, decide if it applies to your use case.

Quality below 80%?

The plugin you’re using is giving your agent incomplete or poorly-structured guidance. Run:

tessl skill review --optimize workspace/plugin-name

This runs a quality review and applies automatic improvements.

No uplift data?

The plugin has never been evaluated against real tasks — so you have no idea if it's helping or hurting. Fix that:

tessl scenario generate --count 5 workspace/plugin-name
tessl eval run workspace/plugin-name

Generate a set of test scenarios from the plugin, then run the eval. You'll get a concrete uplift score showing whether the plugin is worth keeping.

The bigger picture

Every team that uses AI agents is building a dependency graph of skills, rules, and knowledge, just like they build a dependency graph of packages. The tooling for auditing that graph is still being built, but the risks are real and growing.

tessl-audit is a small, practical step: one command, zero installation, actionable output. Run it today and find out what your agent is actually working with.

npx tessl-audit

tessl-audit requires the Tessl CLI (no worries, it’s already a dependency) and an authenticated Tessl session (just create a free account if you haven’t got one). You’ll need a tessl.json in order to run the tessl-audit tool, which is a context manifest tile.

Useful docs:

Top comments (7)

mote • May 18

The audit framing is correct, but I want to push back on one assumption: that you can afford to run an audit step at all.

On embedded platforms — the robot controllers and edge devices where I spend most of my time — you don't have a CLI runtime, you don't have npx access, and you definitely don't have a dev workflow that includes "run tessl-audit before deployment." Your deployment artifact is a binary flashed onto a board.

The security surface for agent skills on embedded looks completely different. Instead of auditing a tessl.json manifest, you're asking: does this skill descriptor contain anything that could be misinterpreted by the on-device model? Could a malformed context instruction cause the agent to issue a unsafe actuator command? Could a prompt injection happen over the serial debug interface?

These are not hypothetical — we've had a case where a verbose error log from a previous skill got injected into the next context window and caused the agent to misinterpret a safety cutoff as a feature flag.

The audit model works great for server-side agent frameworks. For embedded, the question isn't "how do we audit skills?" — it's "how do we constrain what a skill can ask the agent to do?" That's a different tooling problem entirely.

What's your threat model when the agent isn't in a sandboxed server but controlling something in the physical world?

Theo Valmis • May 20

The npm audit parallel is precise. A plugin that shapes how your agent reasons is code that runs in your environment with your permissions — in the most literal sense. The difference from a malicious npm package is that a bad plugin doesn't have a single executable moment you can point to. It shapes behavior on every inference, subtly and cumulatively, without any discrete execution that logs or alerts would catch.

The uplift score concept is the part teams are least likely to think about. Most teams shipping agent plugins right now have no idea whether their plugins help or hurt — they assume "more context = better performance" and ship it. Measuring whether a plugin actually improves task performance compared to a vanilla agent is the kind of check that feels optional until you realize you've been degrading your agent's output for months and attributing it to the model.

Artemii Amelin • May 16

The npm audit analogy is exactly the right frame and the gap is real. One adjacent problem on the transport side: even with clean skills, if two agents are communicating over an unencrypted channel or authenticating each other by hostname alone, that's another surface. Pilot Protocol (pilotprotocol.network) handles this with bilateral Ed25519 handshakes. Neither agent can send or receive until both sides have explicitly approved using their keypairs. Different layer from what tessl-audit covers but the same principle — don't trust by default, verify cryptographically.

Harjot Singh • May 30

Context security is the under-discussed risk that's going to bite teams hard. Once an agent pulls in skills, tools, MCP servers, and retrieved context, every one of those is an injection surface - a poisoned doc or a malicious "skill" can quietly steer the agent, and "it usually works" (vibes) is not a security posture. Treating context as untrusted input is exactly right.

The mental model that helps: anything entering the context window is attacker-controllable until proven otherwise, same as user input in a web app. So you want provenance on skills, least-privilege on tools, and validation on what comes back - not blind trust because it came from "your" agent. Most people are still at the vibes stage here because nothing's blown up yet, which is precisely when it's cheapest to fix. Important post - this deserves way more attention than the cost debates getting all the airtime.

AudioProducer.ai • May 20

The npm-audit-for-skills frame is the right shape, and the quality + uplift + security trio maps onto a use case that comes up less in the agent-tooling thread: an agent that drafts public content. We run a marketing-worker skill (AudioProducer.ai) that auto-publishes substantive comments and articles to dev.to and Medium, and the audit dimensions translate directly: factual grounding (does the draft cite features that actually exist in our product reference?), brand-voice quality, and uplift (does the skill-encoded voice strategy outperform a plain agent draft on engagement?). The mote subthread on embedded constraints rhymes here too: instead of auditing the skill, we encode hard rules at the consumption surface (no competitive claims, AI-disclosure footer required on every AI-drafted dev.to post, no internal-product-detail leakage) that the skill enforces on every output. The interesting open question on our side is the uplift one: agent-drafted comments accumulate engagement signal slower than agent-drafted articles, and we don't yet have a per-tactic "is this skill actually helping" eval the way tessl-audit gives you per-plugin.

Md Kaif Ansari • May 16

yo awesome

Some comments may only be visible to logged-in visitors. Sign in to view all comments.