<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Babak Abbaschian </title>
    <description>The latest articles on DEV Community by Babak Abbaschian  (@baabbakk).</description>
    <link>https://dev.to/baabbakk</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3947119%2F764f8ebb-d600-4ceb-adf5-34edfd249254.png</url>
      <title>DEV Community: Babak Abbaschian </title>
      <link>https://dev.to/baabbakk</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/baabbakk"/>
    <language>en</language>
    <item>
      <title>Prompt cache, finally typed: shipping llm-ports 0.1.0-alpha.19</title>
      <dc:creator>Babak Abbaschian </dc:creator>
      <pubDate>Fri, 12 Jun 2026 20:10:03 +0000</pubDate>
      <link>https://dev.to/baabbakk/prompt-cache-finally-typed-shipping-llm-ports-010-alpha19-20kp</link>
      <guid>https://dev.to/baabbakk/prompt-cache-finally-typed-shipping-llm-ports-010-alpha19-20kp</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR.&lt;/strong&gt; &lt;code&gt;@llm-ports&lt;/code&gt; 0.1.0-alpha.19 ships a typed, provider-neutral surface for prompt caching across Anthropic, OpenAI, and Gemini. One optional &lt;code&gt;cacheControl&lt;/code&gt; field on every request, four modes. Plus a BREAKING field rename: &lt;code&gt;cost.cacheDiscountUSD&lt;/code&gt; → &lt;code&gt;cost.cacheSavingsUSD&lt;/code&gt;. This is the third of four shape-locks before &lt;code&gt;beta.0&lt;/code&gt; on 2026-06-30.&lt;/p&gt;

&lt;p&gt;Install: &lt;code&gt;npm install @llm-ports/core@alpha&lt;/code&gt; resolves to it now.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why prompt cache needed a port surface in the first place
&lt;/h2&gt;

&lt;p&gt;Three providers, three completely different mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anthropic&lt;/strong&gt; does explicit &lt;code&gt;cache_control: { type: "ephemeral" }&lt;/code&gt; markers placed on message-content blocks. You decide which blocks to cache. The provider's cache lookup is keyed on the prefix of cached blocks. You pay a write rate on the first call and a read rate on subsequent calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI&lt;/strong&gt; does implicit always-on caching. There is no API to opt in or out. The system caches what it caches, you pay the discount rate when it hits. No control surface, no API contract.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini&lt;/strong&gt; does &lt;code&gt;createCachedContent&lt;/code&gt; returning a handle. You call the cache-creation API once with the content you want cached, get back an opaque handle, then pass that handle on subsequent requests instead of the original content.&lt;/p&gt;

&lt;p&gt;If you write an application against all three, you write three completely different caching code paths. If you want to switch providers, you rewrite that code path. If you want to add a fourth provider, you add a fourth path.&lt;/p&gt;

&lt;p&gt;This is exactly the situation ports-and-adapters exists to fix. We just hadn't fixed it yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  The locked shape
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CacheControl&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@llm-ports/core&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;CacheControl&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;auto&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;manual&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;preCreated&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;off&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;ttlSeconds&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;breakpoints&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tools&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;message-index&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;index&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;cachedContentHandle&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;cacheControl?&lt;/code&gt; is now optional on &lt;code&gt;GenerateTextOptions&lt;/code&gt;, &lt;code&gt;GenerateStructuredOptions&lt;/code&gt;, &lt;code&gt;StreamTextOptions&lt;/code&gt;, &lt;code&gt;StreamStructuredOptions&lt;/code&gt;, &lt;code&gt;RunAgentOptions&lt;/code&gt;. Omitting it is equivalent to &lt;code&gt;{ mode: "auto" }&lt;/code&gt;: every adapter does whatever its provider does by default. Code that worked under alpha.18 without touching cache control works identically under alpha.19.&lt;/p&gt;

&lt;p&gt;The four modes map onto the three provider patterns:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Anthropic&lt;/th&gt;
&lt;th&gt;OpenAI&lt;/th&gt;
&lt;th&gt;Gemini&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;auto&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;place marker at last static block&lt;/td&gt;
&lt;td&gt;no-op (implicit cache always on)&lt;/td&gt;
&lt;td&gt;no-op&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;manual&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;place markers at supplied breakpoints&lt;/td&gt;
&lt;td&gt;no-op&lt;/td&gt;
&lt;td&gt;no-op&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;preCreated&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;no-op&lt;/td&gt;
&lt;td&gt;no-op&lt;/td&gt;
&lt;td&gt;uses &lt;code&gt;cachedContentHandle&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;off&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;strip &lt;code&gt;cache_control&lt;/code&gt; from blocks&lt;/td&gt;
&lt;td&gt;no-op (no API to disable)&lt;/td&gt;
&lt;td&gt;no-op (no API to disable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ttlSeconds&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;300 or 3600&lt;/td&gt;
&lt;td&gt;ignored&lt;/td&gt;
&lt;td&gt;passed through&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;namespace&lt;/code&gt; partitions cache lookups by tenant when you front the provider with Helicone or a similar caching proxy. Setting &lt;code&gt;namespace: "tenant:acme"&lt;/code&gt; keeps tenant A's cache from spilling into tenant B's lookups, which is the kind of thing that doesn't show up in a benchmark but does show up in your cross-tenant data-leak postmortem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The breaking change
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;cost.cacheDiscountUSD&lt;/code&gt; is renamed to &lt;code&gt;cost.cacheSavingsUSD&lt;/code&gt; on every result object.&lt;/p&gt;

&lt;p&gt;The semantics are unchanged: USD the caller did not pay because they hit prompt cache. The field is still optional and only populated when the provider returned cache telemetry. What changed is the name, and the name was wrong.&lt;/p&gt;

&lt;p&gt;"Discount" implied the provider was generously giving you a price reduction. They aren't. It's your money you didn't spend because you re-sent the same context. OpenInference's &lt;code&gt;llm.cost.cache_savings&lt;/code&gt;, Helicone's dashboards, Langfuse's cost attribution — every observability vendor in the field already uses "savings". We were the odd one out.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;- if (result.cost.cacheDiscountUSD !== undefined) {
-   metrics.cacheSavings.record(result.cost.cacheDiscountUSD);
- }
&lt;/span&gt;&lt;span class="gi"&gt;+ if (result.cost.cacheSavingsUSD !== undefined) {
+   metrics.cacheSavings.record(result.cost.cacheSavingsUSD);
+ }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TypeScript catches every read site. Runtime code that hand-rolled the old field name silently resolves to &lt;code&gt;undefined&lt;/code&gt;, so check your dashboard queries too. Migration guide with the full diff and a couple of gotchas: &lt;a href="https://github.com/baabakk/llm-ports/blob/%40llm-ports%2Fcore%400.1.0-alpha.19/docs/migration/alpha-18-to-alpha-19.md" rel="noopener noreferrer"&gt;docs/migration/alpha-18-to-alpha-19.md&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Worked example
&lt;/h2&gt;

&lt;p&gt;A multi-turn Anthropic conversation with a long stable system prompt and a short turn-by-turn user message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createAnthropicAdapter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@llm-ports/adapter-anthropic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;adapter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createAnthropicAdapter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;adapter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createLLMPort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-opus&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;port&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateText&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;taskType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;longform-summary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;theBookEqualsLongSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;thisTurnsShortUserQuestion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;cacheControl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;manual&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;breakpoints&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="na"&gt;ttlSeconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cacheReadTokens&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// tokens served from cache&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cacheWriteTokens&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// tokens committed to cache&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cacheSavingsUSD&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;    &lt;span class="c1"&gt;// dollars you didn't pay&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same call sites work on OpenAI and Gemini. The &lt;code&gt;cacheControl&lt;/code&gt; field is accepted everywhere; it's a no-op where the provider has no equivalent. Switching providers does not require rewriting the cache path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shape is locked. Behaviors aren't.
&lt;/h2&gt;

&lt;p&gt;This is the part of the contract worth understanding.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;shape&lt;/strong&gt; of &lt;code&gt;CacheControl&lt;/code&gt; is locked at alpha.19. Every field, every literal, every union member is committed. &lt;code&gt;beta.0&lt;/code&gt; ships it unchanged.&lt;/p&gt;

&lt;p&gt;Per-mode adapter behaviors mature across beta minors without breaking the shape:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better static-block detection for Anthropic's &lt;code&gt;mode: "auto"&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Helicone proxy header forwarding for &lt;code&gt;namespace&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Gemini &lt;code&gt;createCachedContent&lt;/code&gt; handle lifecycle helpers (creation, refresh, expiration) under &lt;code&gt;@llm-ports/capabilities&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you write call sites against this shape today, the breakpoint placement Anthropic does in beta.1 is "more correct" placement of the same markers your code already specified. Your call sites don't change. The shape is the contract; the implementation matures behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's coming
&lt;/h2&gt;

&lt;p&gt;The next 18 days finish the shape-lock sequence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;alpha.20 (Tue 2026-06-17).&lt;/strong&gt; &lt;code&gt;BudgetScope&lt;/code&gt;. Five tiers from tenant → customer → user → agent → session. Minute-grain windows because some providers (Cerebras's 30 RPM, Groq's 10 RPM on certain models) have rate limits that can't be expressed in &lt;code&gt;cost:N/day&lt;/code&gt; no matter how creatively you squint at the math.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;alpha.21 (Fri 2026-06-20).&lt;/strong&gt; Observability hook signatures aligned with OpenTelemetry's &lt;code&gt;gen_ai.*&lt;/code&gt; semantic conventions. &lt;code&gt;onCost&lt;/code&gt;, &lt;code&gt;onTokenUsage&lt;/code&gt;, &lt;code&gt;onFallback&lt;/code&gt;, &lt;code&gt;onValidationRetry&lt;/code&gt;, &lt;code&gt;onCacheHit&lt;/code&gt;. Drop-in Langfuse / Phoenix / OpenLLMetry / Datadog wire-up. Zero adapter code on your side.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;beta.0 (Tue 2026-06-30).&lt;/strong&gt; Scope-closed. Contract goes load-bearing. Surface stops moving.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then beta minors stop locking and start delivering things that should have existed already:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;beta.1 (Tue 2026-07-28).&lt;/strong&gt; Resilient fallback preset (no more rolling your own circuit breaker around &lt;code&gt;runtimeFallback.shouldFallback&lt;/code&gt;). Adapter-anthropic stops wasting your money on extended-thinking models that spend their entire output budget on hidden reasoning. Auto multi-turn &lt;code&gt;cache_control&lt;/code&gt; placement for stable system prompts. Four capability factories you've been writing yourself: &lt;code&gt;createDetector&lt;/code&gt;, &lt;code&gt;createTagger&lt;/code&gt;, &lt;code&gt;createAnswerer&lt;/code&gt;, &lt;code&gt;createResponder&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;beta.2 (Tue 2026-08-11).&lt;/strong&gt; Persistent budget backend. Your cost gates survive process restart. Token-denominated mode for &lt;code&gt;"$X per 1M tokens per tenant per day"&lt;/code&gt;. Pluggable &lt;code&gt;CacheBackend&lt;/code&gt; so your exact-prompt response cache lives where you want it (Redis, file, your own KV). Three more factories: &lt;code&gt;createRewriter&lt;/code&gt;, &lt;code&gt;createDecider&lt;/code&gt;, &lt;code&gt;createExpander&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;beta.3 (Tue 2026-08-25).&lt;/strong&gt; &lt;code&gt;RerankPort&lt;/code&gt; gets its first adapter. Three harder factories: &lt;code&gt;createRedactor&lt;/code&gt; for PII-safe prompts, &lt;code&gt;createAgent&lt;/code&gt; ergonomics so multi-turn tool-use stops feeling like assembling IKEA without instructions, capability-wrapper around &lt;code&gt;RerankPort&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1.0.0 (Mon 2026-09-08).&lt;/strong&gt; The contract goes load-bearing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Test stats
&lt;/h2&gt;

&lt;p&gt;626 passing across 7 packages, up from ~615 in alpha.18. 11 new tests in &lt;a href="https://github.com/baabakk/llm-ports/blob/%40llm-ports%2Fcore%400.1.0-alpha.19/packages/core/tests/cache-control.test.ts" rel="noopener noreferrer"&gt;&lt;code&gt;packages/core/tests/cache-control.test.ts&lt;/code&gt;&lt;/a&gt; cover the shape lock and the rename. Two existing cost tests updated. Zero behavioral regressions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub release: &lt;a href="https://github.com/baabakk/llm-ports/releases/tag/%40llm-ports%2Fcore%400.1.0-alpha.19" rel="noopener noreferrer"&gt;&lt;code&gt;@llm-ports/core@0.1.0-alpha.19&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Cache concept doc: &lt;a href="https://github.com/baabakk/llm-ports/blob/%40llm-ports%2Fcore%400.1.0-alpha.19/docs/concepts/cache.md" rel="noopener noreferrer"&gt;docs/concepts/cache.md&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Migration guide: &lt;a href="https://github.com/baabakk/llm-ports/blob/%40llm-ports%2Fcore%400.1.0-alpha.19/docs/migration/alpha-18-to-alpha-19.md" rel="noopener noreferrer"&gt;docs/migration/alpha-18-to-alpha-19.md&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Discussion thread: &lt;a href="https://github.com/baabakk/llm-ports/discussions/42" rel="noopener noreferrer"&gt;#42&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/baabakk/llm-ports" rel="noopener noreferrer"&gt;github.com/baabakk/llm-ports&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Found a real gap in the four modes before beta.0? File an issue. Additive fields are still on the table; structural changes aren't.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>architecture</category>
      <category>llm</category>
    </item>
    <item>
      <title>Stop scattering LLM SDK/API calls across your codebase. Here is the 2-file rule that fixed mine</title>
      <dc:creator>Babak Abbaschian </dc:creator>
      <pubDate>Sat, 23 May 2026 05:48:40 +0000</pubDate>
      <link>https://dev.to/baabbakk/stop-scattering-llm-sdkapi-calls-across-your-codebase-here-is-the-2-file-rule-that-fixed-mine-54on</link>
      <guid>https://dev.to/baabbakk/stop-scattering-llm-sdkapi-calls-across-your-codebase-here-is-the-2-file-rule-that-fixed-mine-54on</guid>
      <description>&lt;p&gt;I upgraded an LLM SDK and expected a routine version bump.&lt;/p&gt;

&lt;p&gt;Instead I had to touch 15+ files, fix breaking changes across four providers, and spend the rest of the day hoping I had not missed one. That was the second time it happened. I knew there would be a third.&lt;/p&gt;

&lt;p&gt;If you have ever shipped a production LLM system, you probably recognize the smell:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An SDK minor version renames &lt;code&gt;maxTokens&lt;/code&gt; to &lt;code&gt;maxOutputTokens&lt;/code&gt; and now 15 files break at runtime, not compile time.&lt;/li&gt;
&lt;li&gt;Switching one classification task from Claude to a cheaper model means editing import paths and type signatures in business logic.&lt;/li&gt;
&lt;li&gt;You have written &lt;code&gt;classifyEmail&lt;/code&gt;, &lt;code&gt;scoreLead&lt;/code&gt;, &lt;code&gt;triageTicket&lt;/code&gt;, and &lt;code&gt;categorizeRequest&lt;/code&gt;, and they are all the same function with a different prompt string.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not an SDK problem. It is an architecture problem. Here is how I fixed it, and the open-source library that came out of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 2-file rule
&lt;/h2&gt;

&lt;p&gt;I made one rule: only two files in the entire codebase are allowed to import the LLM SDK. One adapter that translates my interface into SDK calls, and one provider registry that creates clients from config. Everything else talks to a typed interface and has no idea which provider, model, or SDK is in play.&lt;/p&gt;

&lt;p&gt;This is just hexagonal architecture (ports and adapters, per Alistair Cockburn) applied to LLMs. You already do this for databases and message queues. Nobody scatters raw SQL across business logic. LLM providers belong in the same category. They are infrastructure, not application logic.&lt;/p&gt;

&lt;p&gt;The dependency flow goes from this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Application code
  ├─ direct SDK call
  ├─ direct SDK call
  └─ model router leaking SDK types
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Application code
  ↓  llmClassify(), llmDraft(), llmScore() ...
Capabilities
  ↓
LLM Port  (TypeScript interface, zero SDK imports)
  ↓
Adapters + Provider Registry  (the only 2 files that touch the SDK)
  ↓
OpenAI / Anthropic / Gemini / Ollama / Vercel AI SDK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The caller says &lt;strong&gt;what&lt;/strong&gt; it wants (&lt;code&gt;taskType: "triage"&lt;/code&gt;). The infrastructure decides &lt;strong&gt;how&lt;/strong&gt;. No model name parameter. No provider parameter. Policy is deferred to config.&lt;/p&gt;

&lt;h2&gt;
  
  
  The proof: an SDK upgrade that did not hurt
&lt;/h2&gt;

&lt;p&gt;The real test came during a major SDK version jump with breaking changes (&lt;code&gt;maxTokens&lt;/code&gt; to &lt;code&gt;maxOutputTokens&lt;/code&gt;, &lt;code&gt;CoreMessage&lt;/code&gt; to &lt;code&gt;ModelMessage&lt;/code&gt;, and more). Here is what the migration commit looked like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2 files changed (the adapter and the agent runtime), plus 1 minor fix.&lt;/li&gt;
&lt;li&gt;All 18 activity files unchanged.&lt;/li&gt;
&lt;li&gt;All 10 agent files unchanged.&lt;/li&gt;
&lt;li&gt;The final migration deleted more code than it added: 192 insertions, 688 deletions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;28 out of 31 files did not change, because they do not know the SDK exists. If a core dependency upgrade touches your business logic, your boundaries are wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part that surprised me: the same 7 operations, everywhere
&lt;/h2&gt;

&lt;p&gt;I started this to isolate the SDK. Then I noticed the bigger problem. I was not calling LLMs in 21 different places. I was reimplementing the same seven cognitive operations with slight variations:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;What you give it&lt;/th&gt;
&lt;th&gt;What you get back&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Classify&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;content + rubric&lt;/td&gt;
&lt;td&gt;one label from an enum + reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Score&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;content + rubric + axes&lt;/td&gt;
&lt;td&gt;numeric ratings per axis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Draft&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;persona + situation&lt;/td&gt;
&lt;td&gt;longer text in a chosen tone&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Summarize&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;long content + length target&lt;/td&gt;
&lt;td&gt;shorter content, key points kept&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Extract&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;unstructured text + schema&lt;/td&gt;
&lt;td&gt;a typed structured object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Plan&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;goal + constraints&lt;/td&gt;
&lt;td&gt;an ordered list of steps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Analyze&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;evidence + question&lt;/td&gt;
&lt;td&gt;recommendation with caveats&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Five activities classified content with five different prompt structures. Nine drafted messages with nine different tone injections. Same operation, no shared implementation. When I improved one classification prompt, I had to remember to update four other places. I usually forgot.&lt;/p&gt;

&lt;p&gt;You are not writing 47 prompts. You are writing 7 prompts, 47 times, with slightly different ingredients.&lt;/p&gt;

&lt;p&gt;So I extracted them into capability factories. A factory takes the invariant parts (schema, rubric, model routing, observability hooks) and returns a function that takes only the varying part (the content):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createClassifier&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@llm-ports/capabilities&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;IntentSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enum&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;question&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;request&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;complaint&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;feedback&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;other&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
  &lt;span class="na"&gt;urgency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enum&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;low&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;normal&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;high&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
  &lt;span class="na"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;classifyIntent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createClassifier&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                 &lt;span class="c1"&gt;// your provider-agnostic port&lt;/span&gt;
  &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;IntentSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;schemaName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user-intent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;rubric&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`
    question: asking for information
    request: wants something done
    complaint: reports a problem
    feedback: opinion only
    other: anything else
  `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then every call site, across all your files, is the same shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;classifyIntent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userMessage&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="c1"&gt;// { intent: "request", urgency: "high", reasoning: "..." }  fully typed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Improve the rubric once, and every classifier in the system gets better. Prompt engineering stops being scattered strings and becomes a reusable system asset.&lt;/p&gt;

&lt;h2&gt;
  
  
  llm-ports
&lt;/h2&gt;

&lt;p&gt;I pulled this pattern out of my production system and shipped it as an open-source, MIT-licensed TypeScript library: &lt;strong&gt;llm-ports&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  60 second setup
&lt;/h3&gt;

&lt;p&gt;Configure providers in &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LLM_PROVIDER_FAST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic|&amp;lt;model&amp;gt;|cost:50/day
&lt;span class="nv"&gt;LLM_PROVIDER_SMART&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic|&amp;lt;model&amp;gt;|cost:200/day
&lt;span class="nv"&gt;LLM_TASK_ROUTE_TRIAGE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;fast,smart
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create the port once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createRegistryFromEnv&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@llm-ports/core&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createAnthropicAdapter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@llm-ports/adapter-anthropic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createRegistryFromEnv&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;adapters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;createAnthropicAdapter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;getPort&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use it anywhere, with no SDK imports:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateText&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;taskType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;triage&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Classify this email...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The registry selects the right model for the task, enforces cost limits, falls back through the provider chain on budget exhaustion, and records usage, cost, and latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  What you get
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-provider routing&lt;/strong&gt; across OpenAI, Anthropic, Google Gemini, Ollama, and the Vercel AI SDK.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fallback chains&lt;/strong&gt; when a provider exceeds budget.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;USD-based cost gating&lt;/strong&gt; with hourly, daily, and monthly limits. Budget exhaustion is a typed exception, not a surprise invoice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The 7 capability factories&lt;/strong&gt;: &lt;code&gt;createClassifier&lt;/code&gt;, &lt;code&gt;createScorer&lt;/code&gt;, &lt;code&gt;createDrafter&lt;/code&gt;, &lt;code&gt;createSummarizer&lt;/code&gt;, &lt;code&gt;createExtractor&lt;/code&gt;, &lt;code&gt;createPlanner&lt;/code&gt;, &lt;code&gt;createAnalyzer&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation recovery&lt;/strong&gt; for structured output. If a model returns invalid JSON or a wrong enum, it auto-retries with a correction prompt. Bad output stops at the capability boundary instead of leaking downstream.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool-use safety primitives&lt;/strong&gt;: destructive markers, confirmation-required actions, max output byte limits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability hooks&lt;/strong&gt; for cost, latency, quality, and outcomes.&lt;/li&gt;
&lt;li&gt;No runtime dependency on LangChain or LlamaIndex. Core plus one adapter plus capabilities is a small install footprint, strict TypeScript throughout.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How it compares
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vercel AI SDK&lt;/strong&gt; unifies provider calls. llm-ports adds the registry, fallback chains, USD cost gating, validation recovery, and capability factories on top. There is an adapter to migrate from it incrementally.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LiteLLM&lt;/strong&gt; is a Python-first HTTP proxy. llm-ports is TypeScript and runs in-process, no extra network hop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portkey&lt;/strong&gt; is a commercial hosted gateway. llm-ports is MIT and has no hosted dependency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangChain.js&lt;/strong&gt; is a framework. llm-ports is a lightweight architecture and control layer, not a framework you build your whole app inside.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to use it (and when not to)
&lt;/h3&gt;

&lt;p&gt;Use it if you run 2+ providers (or might switch later), have 5+ call sites, keep getting bitten by SDK upgrades, or need cost control and centralized quality tracking.&lt;/p&gt;

&lt;p&gt;Skip it if you have 1 or 2 LLM calls, you are just prototyping, or you want a full agent framework with a built-in memory and RAG layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Honest status
&lt;/h3&gt;

&lt;p&gt;llm-ports is pre-release, currently at &lt;code&gt;0.1.0-alpha.5&lt;/code&gt;. The core architecture is stable with 250+ offline regression tests, but some adapter and agent paths are still being hardened (multi-turn agent in the Vercel adapter and retry-on-runtime-error both land in v0.2). The per-surface status is documented openly so you know what is solid before you adopt it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @llm-ports/core @llm-ports/adapter-anthropic @llm-ports/capabilities
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;npm: &lt;a href="https://www.npmjs.com/package/@llm-ports/core" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/@llm-ports/core&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub (7 runnable examples, including email triage and PDF extraction): &lt;a href="https://github.com/baabakk/llm-ports" rel="noopener noreferrer"&gt;https://github.com/baabakk/llm-ports&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://baabakk.github.io/llm-ports/" rel="noopener noreferrer"&gt;https://baabakk.github.io/llm-ports/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the capability-factory pattern matches how you are building, I would genuinely like feedback in GitHub Discussions. What shapes are you reimplementing that are not on the list of seven? What knobs do the capabilities need that they do not have yet?&lt;/p&gt;

&lt;p&gt;The LLM stops being a dependency you manage. It becomes infrastructure you configure. Once you make that shift, everything else gets simpler.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Based on two longer write-ups: &lt;a href="https://babak.ai/posts/ports-and-adapters-for-ai-how-i-decoupled-my-entire-codebase-from-the-llm-sdks" rel="noopener noreferrer"&gt;Ports and Adapters for AI&lt;/a&gt; and &lt;a href="https://babak.ai/posts/the-7-llm-capabilities-every-production-ai-system-reimplements" rel="noopener noreferrer"&gt;The 7 LLM Capabilities Every Production AI System Reimplements&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>opensource</category>
      <category>typescript</category>
    </item>
  </channel>
</rss>
