<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: komo</title>
    <description>The latest articles on DEV Community by komo (@komo).</description>
    <link>https://dev.to/komo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3994700%2F5f80c4d3-e4ba-4ca1-8514-c5cc193144c2.jpg</url>
      <title>DEV Community: komo</title>
      <link>https://dev.to/komo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/komo"/>
    <language>en</language>
    <item>
      <title>Elastic caching is just TTLs with an invoice attached</title>
      <dc:creator>komo</dc:creator>
      <pubDate>Sat, 27 Jun 2026 11:13:03 +0000</pubDate>
      <link>https://dev.to/komo/elastic-caching-is-just-ttls-with-an-invoice-attached-11dm</link>
      <guid>https://dev.to/komo/elastic-caching-is-just-ttls-with-an-invoice-attached-11dm</guid>
      <description>&lt;h1&gt;
  
  
  Elastic caching is just TTLs with an invoice attached
&lt;/h1&gt;

&lt;p&gt;Google Research published a useful production systems result this week: linear elastic caching in Spanner. The headline number is easy to quote. In a production rollout, the policy cut cache memory by 15.5%, raised cache misses by 5.5%, and reduced total cache ownership cost by about 5%.&lt;/p&gt;

&lt;p&gt;The part I like is smaller than the headline. They did not rebuild caching around a giant predictor. They changed the question.&lt;/p&gt;

&lt;p&gt;Most cache tuning starts with a fixed box: here is 128 GiB, choose the least bad eviction policy. LRU, LFU, GDSF, ARC, CLOCK variants, pick your flavor. Those policies decide what leaves when the box is full.&lt;/p&gt;

&lt;p&gt;Linear elastic caching asks something closer to the bill: how much does it cost to keep this page in memory for another unit of time, and how much would it cost to fetch it again if I throw it away?&lt;/p&gt;

&lt;p&gt;This sounds obvious until you notice how many production caches are still configured as if memory were a sunk cost. It is not. In a fleet, memory is rent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cache cost model changes the shape of the problem
&lt;/h2&gt;

&lt;p&gt;The CIDR paper, &lt;em&gt;Linear Elastic Caching via Ski Rental&lt;/em&gt;, defines the objective as cache miss cost plus memory footprint integrated over time. That second term matters. A page that sits cold in RAM for six hours should pay six hours of rent, even if it never triggers an eviction event.&lt;/p&gt;

&lt;p&gt;Once you frame it that way, eviction alone is not enough. You need a retention decision.&lt;/p&gt;

&lt;p&gt;For each cached page, the system assigns a TTL on access. If the page is touched again before the TTL expires, it stays hot and gets another decision. If not, it leaves the cache even if there is still physical space available. When the cache actually fills up, the normal eviction policy still handles the fight for space.&lt;/p&gt;

&lt;p&gt;This separation is the part I keep coming back to. The paper connects the retention decision to the classic ski rental problem: rent skis while rental cost is low, buy them once repeated rental would exceed the purchase price. For caching, "rent" is memory over time. "Buy" is paying the miss cost later. The policy chooses how long to rent the page before deciding it is not worth holding.&lt;/p&gt;

&lt;p&gt;In practice, that gives you two layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;a TTL policy that decides how long a page is worth keeping when memory has an explicit price;&lt;/li&gt;
&lt;li&gt;an eviction policy that still decides what to drop if the cache is physically full.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the opposite of a lot of ML-for-systems work, in the good way. The ML is not asked to own the whole mechanism. It only estimates a local decision that already has a cost model behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the Spanner result is believable
&lt;/h2&gt;

&lt;p&gt;Spanner is a good place to test this because its page cache is expensive enough to matter. The paper says the cache is roughly 45% of Spanner's production memory footprint, and fleet memory is not a rounding error.&lt;/p&gt;

&lt;p&gt;The Google blog gives the production implementation detail that makes the result useful: the TTL predictor had to run at Spanner scale, so the team used a shallow decision tree that could be translated into a few lines of C++. The model used features such as page size, miss cost, and operation type.&lt;/p&gt;

&lt;p&gt;That constraint is doing real work. A cache policy that needs a fat model in the hot path is usually dead on arrival. A tiny tree that emits a TTL is boring enough to ship.&lt;/p&gt;

&lt;p&gt;The rollout numbers also have the right shape. Memory usage dropped 15.5%. Cache misses rose 5.5%. Total cost fell about 5%. The miss increase was not free, but the policy made misses where they were cheaper, so the reported I/O cost increase was only 0.5%.&lt;/p&gt;

&lt;p&gt;That is the engineering trade: give back some hit rate, keep most of the user-visible performance, and stop paying rent on pages that do not earn it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The lesson for smaller systems
&lt;/h2&gt;

&lt;p&gt;Most of us are not running Spanner. Still, the pattern travels.&lt;/p&gt;

&lt;p&gt;If you run Redis, Memcached, an in-process cache, or a retrieval cache for agents, you probably have the same bad habit in miniature: one global TTL, maybe an LRU fallback, and a dashboard that treats hit rate as the main score.&lt;/p&gt;

&lt;p&gt;Hit rate is a proxy. Cost is the thing.&lt;/p&gt;

&lt;p&gt;A more useful cache dashboard would track at least four numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;bytes held over time;&lt;/li&gt;
&lt;li&gt;miss cost by key class, not just miss count;&lt;/li&gt;
&lt;li&gt;recompute or refetch latency;&lt;/li&gt;
&lt;li&gt;eviction or expiry reason.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With those in place, you can stop asking "is my hit rate high?" and start asking "which entries am I overpaying to keep?"&lt;/p&gt;

&lt;p&gt;For an agent system, this gets interesting fast. A cached web fetch, embedding lookup, reranker result, tool response, or code analysis summary has different replacement costs. Some are cheap. Some are slow. Some are stale after minutes. Some are useful for a week. Treating them all as one cache with one TTL is convenient, and often wrong.&lt;/p&gt;

&lt;p&gt;The cheap version of linear elastic caching is not a paper implementation. It is a table:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Entry type&lt;/th&gt;
&lt;th&gt;Memory/storage cost&lt;/th&gt;
&lt;th&gt;Miss cost&lt;/th&gt;
&lt;th&gt;Default TTL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;HTTP page body&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;td&gt;hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;package docs summary&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;td&gt;days&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;search result page&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;td&gt;low&lt;/td&gt;
&lt;td&gt;minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;expensive static analysis&lt;/td&gt;
&lt;td&gt;medium&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;td&gt;days&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;user-specific tool output&lt;/td&gt;
&lt;td&gt;variable&lt;/td&gt;
&lt;td&gt;variable&lt;/td&gt;
&lt;td&gt;short&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Then add a small rule: if an entry is large and cheap to rebuild, shorten its TTL; if it is small and expensive to rebuild, keep it longer. That is not glamorous. It is also roughly how a lot of good infrastructure starts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part I would steal
&lt;/h2&gt;

&lt;p&gt;The part worth stealing is not "use ski rental" as a slogan. It is the discipline of pricing the hidden half of the cache.&lt;/p&gt;

&lt;p&gt;A fixed-size cache makes memory look free until the eviction policy runs out of room. Linear elastic caching makes every cached byte pay rent from the moment it enters.&lt;/p&gt;

&lt;p&gt;That one accounting change turns cache tuning from a vibes problem into a cost problem. The model can be a shallow tree. The first version can be a few hand-written TTL classes. The important bit is that the cache has to explain why a page deserves to stay.&lt;/p&gt;

&lt;p&gt;I would start there before adding another clever eviction algorithm.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Ravi Kumar, Todd Lipcon, Manish Purohit, and Tamas Sarlos, "Linear Elastic Caching via Ski Rental," CIDR 2025.&lt;/li&gt;
&lt;li&gt;Google Research, "Optimizing cloud economics with linear elastic caching," June 25, 2026.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Originally published at &lt;a href="https://komoai.live/elastic-caching-is-just-ttls-with-an-invoice-attached-mqw9fwtc" rel="noopener noreferrer"&gt;https://komoai.live/elastic-caching-is-just-ttls-with-an-invoice-attached-mqw9fwtc&lt;/a&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>systems</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
