DEV Community

Cover image for Tool count is a vanity metric. Annotation coverage is what makes an AI agent safe.
FavCRM
FavCRM

Posted on • Originally published at favcrm.io

Tool count is a vanity metric. Annotation coverage is what makes an AI agent safe.

Syndicated from the FavCRM blog. The number that predicts whether an agent is safe to let loose isn't the tool count.

When people compare agentic CRMs, they count tools. The number that actually predicts whether an agent is safe to let loose is a different one: annotation coverage. An MCP tool annotation tells the agent what a tool does to the world — whether it reads or mutates, whether it's safe to retry, whether it reaches an external service. Without annotations, the agent is guessing. This is what they are, and why a catalog's annotation coverage matters more than its tool count.

What an MCP annotation is

Every MCP tool can carry hints alongside its input and output schemas:

  • readOnlyHint — the tool only reads; it changes nothing. Safe to call freely.
  • destructiveHint — the tool mutates or deletes. The agent should confirm before calling.
  • idempotentHint — calling it twice with the same input has the same effect as once. Safe to retry on a timeout.
  • openWorldHint — the tool reaches an external service (sends an email, charges a card), so its effects leave the system.

These are not documentation for humans. They are machine-readable signals the agent reasons over before it acts.

Why they prevent the worst failures

The dangerous class of agent failure is not "the agent couldn't do something." It's "the agent did the wrong destructive thing because it misread an ambiguous instruction." Delete the customer instead of the tag. Refund the wrong invoice. Cancel every booking instead of one.

Annotations let the agent self-gate. A well-annotated catalog means the agent calls list_members without ceremony but pauses to confirm before cancel_booking, because one is marked read-only and the other destructive. Pre-MCP function-calling had no equivalent — every tool looked the same to the model, so safety lived entirely in the prompt.

Why coverage matters more than count

A 190+ tool catalog with 100% annotation coverage is safer than a 30-tool catalog with none. A tool that lacks a destructiveHint is a landmine: the agent has no way to know it's dangerous until it has already called it. So when you evaluate an agentic CRM, the question isn't "how many tools" — it's "what fraction are annotated, and are the destructive ones marked?"

FavCRM ships 190+ typed tools at 100% annotation coverage. Every mutating tool is flagged, every read-only tool is marked safe, and the agent gates itself accordingly. That is what makes it safe to point a live agent at a real workspace — including the sensitive-record verticals like clinics where a confused destructive call is unacceptable.

What to ask a vendor

  1. What is your annotation coverage — a number, not "we have annotations"?
  2. Is every destructive tool marked destructiveHint?
  3. Are external-effect tools (email, payments) marked openWorldHint?

If the answers are vague, the safety lives in the prompt, and prompts fail.

See it

Browse the MCP catalog for the full annotated tool surface, or read what an agentic CRM is for the bigger picture. The free tier covers 100 customers and 200 bookings a month.

Top comments (0)