Petter_Strale

Posted on Mar 25 • Originally published at strale.dev

We Scored 2/6 on Our Own Agent-Readiness Scanner. Here's How We Fixed It.

#ai #webdev #api #opensource

We build Strale, an API that gives AI agents access to business data capabilities — company lookups, compliance checks, financial data, that kind of thing. Agents call strale.do() at runtime and get structured, quality-scored results back.

A few weeks ago we asked ourselves a question we probably should have asked sooner: if an agent tried to discover and use our own API, what would it actually experience?

We didn't know. So we built a tool to find out.

The tool we built for ourselves

We started with a simple internal checker — a script that hit our API the way an agent would and reported what it found. Could it find our llms.txt? Could it parse our OpenAPI spec? Did our MCP endpoint actually respond? Were our error messages machine-readable or just HTML 404 pages?

The script grew. We added checks for structured data, robots.txt crawler policies, authentication documentation, rate limit headers, content negotiation, schema drift between our spec and our live responses, machine-readable pricing, and a dozen more signals. By the time we stopped adding checks, we had 32 of them across 6 categories.

Then we ran it against our own API.

The first scan: 2/6

We scored 2 out of 6 categories as "agent-ready." Two. On our own product — a platform specifically built for AI agents.

The llms.txt file was there, and our structured data was fine. But our OpenAPI spec had drifted from our actual responses (11 fields didn't match). Our MCP server was discoverable but the functional verification test couldn't complete a handshake. We had no machine-readable pricing. Our error responses at some paths returned HTML instead of JSON. Authentication was documented in our human-readable docs but not in the OpenAPI spec where agents would look for it.

We were building an agent platform that agents couldn't properly use.

Fixing it

We worked through the failing checks one by one. Some were quick — adding securitySchemes to our OpenAPI spec took ten minutes. Publishing JSON-LD pricing data was another fifteen. Fixing the MCP handshake was a real debugging session.

The hardest part was schema drift. Our spec said one thing; our API returned another. Eleven extra fields that we'd added over time without updating the spec. No human user would notice, but an agent comparing the spec to the response would get confused.

After three rounds of scanning and fixing, we got to 6/6. Every check passing. It took about two days of focused work, spread across a week.

Why we made it free

We figured other teams probably had the same blind spot. You build a great API, you write docs for humans, you set up a marketing site — and you never check what an agent actually sees when it shows up.

So we put a web interface on the scanner and made it free. No signup, no paywall. You type in a URL, it runs the 32 checks, and you get a report showing exactly what passed and what didn't — with the specific HTTP requests that were made and what the responses contained.

We called it Beacon. It lives at scan.strale.io.

What the 6 categories measure

Discoverability — Can agents find you? Checks llms.txt, robots.txt AI crawler policies, structured data, sitemap coverage, MCP/A2A endpoints.

Comprehension — Can agents understand what you do? OpenAPI spec presence and accuracy, documentation accessibility, schema drift between spec and live responses, machine-readable pricing.

Usability — Can agents interact with you? Authentication documentation in machine-readable formats, signup friction, sandbox availability, error response quality.

Stability — Can agents depend on you? API versioning, changelogs, rate limit headers, terms of service compatibility, security headers.

Agent Experience — What happens when an agent arrives? First-contact response quality, documentation navigability from the root, response format consistency.

Transactability — Can agents do business with you? Machine-readable pricing, self-serve provisioning, agent-compatible checkout protocols, usage/billing transparency.

The thing that surprised us

The gap between "works for humans" and "works for agents" was bigger than we expected. Our API documentation was thorough — for a person reading it in a browser. But an agent doesn't read your docs page. It looks for an OpenAPI spec. It checks securitySchemes. It tries to parse your root response for navigation links. It looks for /.well-known/mcp.json.

Most of the fixes were small. The problem wasn't that our API was bad — it was that the machine-readable layer on top was incomplete or inconsistent.

Three output formats

Every scan produces a web report, a downloadable PDF, and a structured JSON report. The JSON one is designed for a specific workflow: paste it into Claude or ChatGPT and say "fix everything." The JSON includes the exact check that failed, what was tested, what was found, and a fix with a verification command.

MCP server

Beacon also ships as an MCP server, which felt appropriate for an agent-readiness tool. Install it in Claude Code:

claude mcp add strale-beacon -- npx strale-beacon-mcp

Three tools: scan, get_report, list_checks. You can scan domains from inside your development workflow without opening a browser.

Try it on your own product

We're curious what other teams find. Our guess, based on the handful of products we've scanned so far, is that most APIs score 1-2 out of 6 — even APIs that are well-built and well-documented for human consumption.

scan.strale.io — takes about 10 seconds.

We're actively improving Beacon and would love feedback — missing checks, confusing results, things that would make the report more useful. Drop a comment here or email hello@strale.io.

DEV Community