Prompt cache, finally typed: shipping llm-ports 0.1.0-alpha.19

#ai #typescript #architecture #llm

TL;DR. @llm-ports 0.1.0-alpha.19 ships a typed, provider-neutral surface for prompt caching across Anthropic, OpenAI, and Gemini. One optional cacheControl field on every request, four modes. Plus a BREAKING field rename: cost.cacheDiscountUSD → cost.cacheSavingsUSD. This is the third of four shape-locks before beta.0 on 2026-06-30.

Install: npm install @llm-ports/core@alpha resolves to it now.

Why prompt cache needed a port surface in the first place

Three providers, three completely different mechanisms.

Anthropic does explicit cache_control: { type: "ephemeral" } markers placed on message-content blocks. You decide which blocks to cache. The provider's cache lookup is keyed on the prefix of cached blocks. You pay a write rate on the first call and a read rate on subsequent calls.

OpenAI does implicit always-on caching. There is no API to opt in or out. The system caches what it caches, you pay the discount rate when it hits. No control surface, no API contract.

Gemini does createCachedContent returning a handle. You call the cache-creation API once with the content you want cached, get back an opaque handle, then pass that handle on subsequent requests instead of the original content.

If you write an application against all three, you write three completely different caching code paths. If you want to switch providers, you rewrite that code path. If you want to add a fourth provider, you add a fourth path.

This is exactly the situation ports-and-adapters exists to fix. We just hadn't fixed it yet.

The locked shape

import type { CacheControl } from "@llm-ports/core";

interface CacheControl {
  mode: "auto" | "manual" | "preCreated" | "off";
  ttlSeconds?: number;
  breakpoints?: Array<{ at: "tools" | "system" | "message-index"; index?: number }>;
  cachedContentHandle?: string;
  namespace?: string;
}

cacheControl? is now optional on GenerateTextOptions, GenerateStructuredOptions, StreamTextOptions, StreamStructuredOptions, RunAgentOptions. Omitting it is equivalent to { mode: "auto" }: every adapter does whatever its provider does by default. Code that worked under alpha.18 without touching cache control works identically under alpha.19.

The four modes map onto the three provider patterns:

Mode	Anthropic	OpenAI	Gemini
`auto`	place marker at last static block	no-op (implicit cache always on)	no-op
`manual`	place markers at supplied breakpoints	no-op	no-op
`preCreated`	no-op	no-op	uses `cachedContentHandle`
`off`	strip `cache_control` from blocks	no-op (no API to disable)	no-op (no API to disable)
`ttlSeconds`	300 or 3600	ignored	passed through

namespace partitions cache lookups by tenant when you front the provider with Helicone or a similar caching proxy. Setting namespace: "tenant:acme" keeps tenant A's cache from spilling into tenant B's lookups, which is the kind of thing that doesn't show up in a benchmark but does show up in your cross-tenant data-leak postmortem.

The breaking change

cost.cacheDiscountUSD is renamed to cost.cacheSavingsUSD on every result object.

The semantics are unchanged: USD the caller did not pay because they hit prompt cache. The field is still optional and only populated when the provider returned cache telemetry. What changed is the name, and the name was wrong.

"Discount" implied the provider was generously giving you a price reduction. They aren't. It's your money you didn't spend because you re-sent the same context. OpenInference's llm.cost.cache_savings, Helicone's dashboards, Langfuse's cost attribution — every observability vendor in the field already uses "savings". We were the odd one out.

- if (result.cost.cacheDiscountUSD !== undefined) {
-   metrics.cacheSavings.record(result.cost.cacheDiscountUSD);
- }
+ if (result.cost.cacheSavingsUSD !== undefined) {
+   metrics.cacheSavings.record(result.cost.cacheSavingsUSD);
+ }

TypeScript catches every read site. Runtime code that hand-rolled the old field name silently resolves to undefined, so check your dashboard queries too. Migration guide with the full diff and a couple of gotchas: docs/migration/alpha-18-to-alpha-19.md.

Worked example

A multi-turn Anthropic conversation with a long stable system prompt and a short turn-by-turn user message:

import { createAnthropicAdapter } from "@llm-ports/adapter-anthropic";

const adapter = createAnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY! });
const port = adapter.createLLMPort("claude-opus-4-7", "claude-opus");

const result = await port.generateText({
  taskType: "longform-summary",
  instructions: theBookEqualsLongSystemPrompt,
  prompt: thisTurnsShortUserQuestion,
  cacheControl: {
    mode: "manual",
    breakpoints: [{ at: "system" }],
    ttlSeconds: 3600,
  },
});

console.log(result.usage.cacheReadTokens);   // tokens served from cache
console.log(result.usage.cacheWriteTokens);  // tokens committed to cache
console.log(result.cost.cacheSavingsUSD);    // dollars you didn't pay

The same call sites work on OpenAI and Gemini. The cacheControl field is accepted everywhere; it's a no-op where the provider has no equivalent. Switching providers does not require rewriting the cache path.

Shape is locked. Behaviors aren't.

This is the part of the contract worth understanding.

The shape of CacheControl is locked at alpha.19. Every field, every literal, every union member is committed. beta.0 ships it unchanged.

Per-mode adapter behaviors mature across beta minors without breaking the shape:

Better static-block detection for Anthropic's mode: "auto".
Helicone proxy header forwarding for namespace.
Gemini createCachedContent handle lifecycle helpers (creation, refresh, expiration) under @llm-ports/capabilities.

If you write call sites against this shape today, the breakpoint placement Anthropic does in beta.1 is "more correct" placement of the same markers your code already specified. Your call sites don't change. The shape is the contract; the implementation matures behind it.

What's coming

The next 18 days finish the shape-lock sequence:

alpha.20 (Tue 2026-06-17). BudgetScope. Five tiers from tenant → customer → user → agent → session. Minute-grain windows because some providers (Cerebras's 30 RPM, Groq's 10 RPM on certain models) have rate limits that can't be expressed in cost:N/day no matter how creatively you squint at the math.
alpha.21 (Fri 2026-06-20). Observability hook signatures aligned with OpenTelemetry's gen_ai.* semantic conventions. onCost, onTokenUsage, onFallback, onValidationRetry, onCacheHit. Drop-in Langfuse / Phoenix / OpenLLMetry / Datadog wire-up. Zero adapter code on your side.
beta.0 (Tue 2026-06-30). Scope-closed. Contract goes load-bearing. Surface stops moving.

Then beta minors stop locking and start delivering things that should have existed already:

beta.1 (Tue 2026-07-28). Resilient fallback preset (no more rolling your own circuit breaker around runtimeFallback.shouldFallback). Adapter-anthropic stops wasting your money on extended-thinking models that spend their entire output budget on hidden reasoning. Auto multi-turn cache_control placement for stable system prompts. Four capability factories you've been writing yourself: createDetector, createTagger, createAnswerer, createResponder.
beta.2 (Tue 2026-08-11). Persistent budget backend. Your cost gates survive process restart. Token-denominated mode for "$X per 1M tokens per tenant per day". Pluggable CacheBackend so your exact-prompt response cache lives where you want it (Redis, file, your own KV). Three more factories: createRewriter, createDecider, createExpander.
beta.3 (Tue 2026-08-25). RerankPort gets its first adapter. Three harder factories: createRedactor for PII-safe prompts, createAgent ergonomics so multi-turn tool-use stops feeling like assembling IKEA without instructions, capability-wrapper around RerankPort.
1.0.0 (Mon 2026-09-08). The contract goes load-bearing.

Test stats

626 passing across 7 packages, up from ~615 in alpha.18. 11 new tests in packages/core/tests/cache-control.test.ts cover the shape lock and the rename. Two existing cost tests updated. Zero behavioral regressions.