DEV Community

Sattyam Jain
Sattyam Jain

Posted on

Anthropic bought Stainless. Here's how I'm hardening multi-vendor MCP servers this week.

Anthropic bought Stainless. Here's how I'm hardening multi-vendor MCP servers this week.

Quick context for anyone who missed yesterday's news: Anthropic acquired Stainless on 2026-05-18. Stainless is the SDK and MCP-server scaffolding company that powered every official Anthropic SDK from day one — and the official SDKs at OpenAI, Google, Cloudflare, Meta's Llama Stack, Runway, Replicate, Cerebras, Groq, and Modern Treasury. TechCrunch confirms the deal at $300M+. Hosted SDK generator: winding down today.

Sources:

If you ship MCP servers in production and you ride more than one model vendor (most production shops do), the practical change is that the producer side of the MCP supply chain and the policy side now share a vendor. The patch cadence, schema-validation defaults, and STDIO posture for Stainless-generated servers are now an Anthropic roadmap decision.

Here's the concrete plan I'm running this week for the agent-airlock CVE regression suite, in case it's useful.

1. Tag every MCP server by provenance

Add a single field to your audit log:

@dataclass
class McpServerCallRecord:
    server_name: str
    server_provenance: Literal[
        "stainless-generated",     # SDK or server was generated by Stainless
        "stainless-then-hand-edited",  # Stainless-generated, then forked
        "hand-written",            # never touched Stainless
        "vendor-bundled",          # e.g. Splunk / MongoDB / Elastic / GitLab / Fivetran first-party MCP
        "unknown",                 # default — investigate
    ]
    tool_name: str
    args_hash: str
    started_at: datetime
    duration_ms: int
    outcome: Literal["ok", "denied", "error"]
Enter fullscreen mode Exit fullscreen mode

The reason this matters: post-acquisition, Stainless-generated server defaults are going to diverge from Anthropic-policy server defaults on a quarterly cadence. You want to be able to grep your audit log for server_provenance = "stainless-generated" when a Stainless codegen update lands, so you know which servers in your fleet you need to re-test first.

2. Move STDIO MCP to deny-by-default (if you haven't already)

This is best practice from CVE-2026-30623 and only becomes more important now. The minimal posture:

from agent_airlock import airlock, RbacPolicy, NetworkAirgap

@airlock(
    rbac=RbacPolicy.deny_all_then_allow(["read_file", "list_files"]),
    network=NetworkAirgap.allow_only(["https://api.attri.ai"]),
    pii_mask=True,
    strip_ghost_args=True,
    sandbox=E2BSandbox(timeout_s=30),
    cost_budget_usd=0.10,
)
def call_mcp_tool(server: str, tool: str, args: dict) -> dict:
    ...
Enter fullscreen mode Exit fullscreen mode

One decorator. The same decorator works whether the downstream MCP server was Stainless-generated, hand-written, or vendor-bundled. That's the property that survives yesterday's deal.

3. Pin and version-watch your Stainless-generated SDKs

Existing Stainless customers keep what they generated — TechCrunch and the Anthropic FAQ both confirm this — but the upstream is closed to new signups. So:

  • Pin every Stainless-generated SDK to an explicit version in your lockfile.
  • Set up a weekly diff check against the last open snapshot of the Stainless template repo (if available — likely the template repos will become Anthropic-private over the next 30 days, worth scraping a frozen copy today).
  • Treat any future "Stainless SDK update" notice as a security event requiring re-test, not a routine dependency bump.

4. Add HarnessAudit-Bench to your regression suite

The HarnessAudit paper from UCSD / Florida / Princeton (arXiv 2605.14271) shipped a 210-task benchmark scoring agent harnesses on resource-access violations and inter-agent information-transfer violations across 8 real-world domains. Those are the two failure modes that an MCP-hardening layer should be peer-comparable on.

Concrete: I'm wiring harness-audit-bench into the agent-airlock CI as a nightly job this week. If you're shipping a competing layer, this is the bench number that's going to matter in the next 60 days of buyer conversations.

5. The competitive landscape, briefly

If you're picking up a vendor-neutral MCP hardening layer for the first time, the three options on the table:

  1. Microsoft Agent Governance Toolkit (April 2026, microsoft/agent-governance-toolkit, MIT). 7 packages, sub-millisecond policy enforcement, OWASP Agentic Top 10 mapping. Framework-agnostic on paper, Azure-deployment-pinned in practice.
  2. Roll your own around OWASP Agentic Top 10 (2026). Where most production shops actually are. Cost is operational drift.
  3. agent-airlock (sattyamjjain/agent-airlock, MIT). v0.8.1, 2,405 tests, 11 framework adapters, 10+ MCP CVE regression. Decorator-first, vendor-neutral by construction.

(Disclosure: I ship agent-airlock. The plan above is what I'm running today. Pick the option that matches your team's deployment posture, not the loudest one.)

What to watch in the next 30 days

The question I don't have a clean answer for is whether OpenAI / Google / Cloudflare / Meta / Runway move to replace Stainless with a single neutral vendor (Vercel? Cloudflare itself? a YC-backed analog?) or whether the open-source MCP-server-codegen lane hardens fast enough to absorb the demand. Either outcome shifts the default-trust posture further from "trust the producer," which is good for everyone running multi-vendor agents.

Open thread: how is your team tiering MCP server provenance after yesterday? Drop a comment.

Top comments (0)