Prompt caching vs Statefull approach

Conrad Bogusz — Mon, 04 Aug 2025 11:09:50 +0000

We still haven’t figured out how to optimize binge-watching all 38 seasons of The Simpsons 🍿. Luckily, there’s no real need to, but in the world of LLMs, it’s a whole different story.

Here’s how some providers handle prompt caching:

✅ OpenAI: Auto-caches the longest matching prefix (after 1,024 tokens, then in 128-token chunks). No config needed; up to 80% lower latency and 50% lower input costs.
⚙️ Anthropic: Manual caching via headers (𝘢𝘯𝘵𝘩𝘳𝘰𝘱𝘪𝘤-𝘣𝘦𝘵𝘢: 𝘱𝘳𝘰𝘮𝘱𝘵-𝘤𝘢𝘤𝘩𝘪𝘯𝘨-𝘠𝘠𝘠𝘠-𝘔𝘔-𝘋𝘋 + 𝘤𝘢𝘤𝘩𝘦_𝘤𝘰𝘯𝘵𝘳𝘰𝘭). Works only for exact prefix matches. Reading saves ~90% cost; writes add ~25%.
🔧 AWS Bedrock: Opt-in with 𝘌𝘯𝘢𝘣𝘭𝘦𝘗𝘳𝘰𝘮𝘱𝘵𝘊𝘢𝘤𝘩𝘪𝘯𝘨=𝘵𝘳𝘶𝘦, TTL of 5 minutes. Saves up to 90% on input and 85% on latency.
📦 Google Vertex: Manual 𝘊𝘢𝘤𝘩𝘦𝘥𝘊𝘰𝘯𝘵𝘦𝘯𝘵, caches by token-hours, up to 75% discount on reads, TTL up to 1 hour. More complex to manage.

If you are more interested into new method and possibilites with STATEFUL API for AI model inference go to ark-labs.cloud

Give it a try at ark-labs.cloud 🚀

Meet ARKLABS API: Stateful AI Inference you never heard about

Conrad Bogusz — Thu, 31 Jul 2025 10:14:43 +0000

TL;DR: ARK Cloud API launches today with stateful AI inference (almost free input tokens), signup & model inference in under 10 s with Google SSO (no credit card needed), and up to 71% cost savings on Stable Diffusion 3.5 Large inference — all running on 100% EU‑based infrastructure.

Why Stateful AI Matters

Most AI APIs are stateless—meaning you resend the same context over and over, burning budget and GPU cycles. As inference demand skyrockets, this inefficiency becomes a bottleneck. Enter stateful inference:

Almost-Zero-Cost Input Tokens Context persists across calls, so you never overpay for tokens you’ve already sent.
Optimized GPU Utilization Less recompute = more throughput on the same hardware.

We built ARK Cloud API to fix that. Our stateful mode “remembers” your context so your input tokens cost zero—forever. That means richer, longer conversations and way more efficient GPU use.

Key Features

🚀 10-Second Onboarding Google SSO → Dashboard → API Key. Blink, and you’re running inference.
💰 50 000 Free Credits No credit card required. Fuel LLMs, STT, embeddings, and Stable Diffusion.
🔀 OpenAI-Compatible API Swap endpoints, keep your existing code.
🇪🇺 100% EU Infrastructure GDPR-strong, no logs, no stored data.
💸 Pay-As-You-Go Only pay for output tokens and compute time.
🎨 Cheapest Stable Diffusion Best price on the market for Stable Diffusion

Visit ark-labs.cloud

Sign in with Google (⏱️ 10s)
Claim your 50 000 free credits
Integrate your existing calls to ARK Cloud API
Scale with confidence—no hidden fees, total privacy

DEV Community: Conrad Bogusz

Prompt caching vs Statefull approach

Meet ARKLABS API: Stateful AI Inference you never heard about

Why Stateful AI Matters

Key Features