<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: wartzar-bee</title>
    <description>The latest articles on DEV Community by wartzar-bee (@wartzarbee).</description>
    <link>https://dev.to/wartzarbee</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3958842%2F5a58ffae-e997-4cb4-9cf2-8e5fc1122dbd.png</url>
      <title>DEV Community: wartzar-bee</title>
      <link>https://dev.to/wartzarbee</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/wartzarbee"/>
    <language>en</language>
    <item>
      <title>The OAuth refresh-token race that logs your users out — and the two-layer fix</title>
      <dc:creator>wartzar-bee</dc:creator>
      <pubDate>Fri, 29 May 2026 17:00:27 +0000</pubDate>
      <link>https://dev.to/wartzarbee/the-oauth-refresh-token-race-that-logs-your-users-out-and-the-two-layer-fix-3obf</link>
      <guid>https://dev.to/wartzarbee/the-oauth-refresh-token-race-that-logs-your-users-out-and-the-two-layer-fix-3obf</guid>
      <description>&lt;p&gt;Your auth has worked for months. Then you ship a small change — a page that fires a few API calls in parallel, a worker pool, a second CLI instance, an agent — and suddenly users get logged out at random. The logs say &lt;code&gt;invalid_grant&lt;/code&gt;. Sometimes it's worse: &lt;code&gt;refresh_token_reused&lt;/code&gt;, and a working session is nuked everywhere.&lt;/p&gt;

&lt;p&gt;Nothing in your token &lt;em&gt;flow&lt;/em&gt; is wrong. The bug is that you're doing the correct flow &lt;strong&gt;concurrently&lt;/strong&gt; with a token that only tolerates being used once.&lt;/p&gt;

&lt;h2&gt;
  
  
  The race, step by step
&lt;/h2&gt;

&lt;p&gt;An OAuth2 client holds a short-lived &lt;strong&gt;access token&lt;/strong&gt; and a long-lived &lt;strong&gt;refresh token&lt;/strong&gt;. When the access token expires, you POST the refresh token to the token endpoint and get a new access token.&lt;/p&gt;

&lt;p&gt;With &lt;strong&gt;refresh-token rotation&lt;/strong&gt; — now the default at Okta, Auth0, Microsoft, and Salesforce, and recommended by the OAuth 2.0 Security BCP for public clients — that refresh token is &lt;strong&gt;single-use&lt;/strong&gt;. The refresh response carries a &lt;em&gt;new&lt;/em&gt; refresh token, and the one you just sent is invalidated the instant the first refresh succeeds.&lt;/p&gt;

&lt;p&gt;The bug appears whenever more than one request needs a token at the same time. With two callers A and B:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;t0   access token is expired (or within the skew window)
t1   caller A reads creds, sees "expired", POSTs refresh_token = R0
t2   caller B reads creds, sees "expired", POSTs refresh_token = R0   // same token!
t3   provider processes A: issues access A1 + rotates R0 -&amp;gt; R1, REVOKES R0
t4   provider processes B: R0 is revoked  -&amp;gt;  400 invalid_grant
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both callers did exactly what the textbook says. The loser of the race presented a token the winner already rotated away. That's the &lt;code&gt;invalid_grant&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it can be worse than a stray error
&lt;/h3&gt;

&lt;p&gt;Some providers (Okta, Auth0, Salesforce) run &lt;strong&gt;refresh-token reuse detection&lt;/strong&gt;. Presenting an already-rotated refresh token looks &lt;em&gt;identical&lt;/em&gt; to a stolen token being replayed — the provider can't tell your innocent race from an attack — so it does the safe thing and &lt;strong&gt;revokes the entire refresh-token family&lt;/strong&gt;, logging the user out everywhere.&lt;/p&gt;

&lt;p&gt;That's the difference between a retryable hiccup and a support ticket. On these providers, serializing refresh isn't an optimization — it's a correctness requirement.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The trap:&lt;/strong&gt; &lt;code&gt;invalid_grant&lt;/code&gt; &lt;em&gt;reads&lt;/em&gt; like "the user is logged out, re-auth them." Under concurrency it usually means "a sibling request already refreshed; your copy is stale." Re-authenticating on every concurrency-induced &lt;code&gt;invalid_grant&lt;/code&gt; produces exactly the "surprise re-login" symptom you're trying to kill.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The fix has two layers — and people ship only one
&lt;/h2&gt;

&lt;p&gt;The whole fix reduces to one rule: &lt;strong&gt;make exactly one refresh happen, and have every other caller use its result instead of starting their own.&lt;/strong&gt; But there are two &lt;em&gt;scopes&lt;/em&gt;, and using the wrong-scope fix is the #1 reason the bug "comes back" after you thought you fixed it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1 — In-process single-flight (one process, many concurrent calls)
&lt;/h3&gt;

&lt;p&gt;The first caller to see expiry starts the refresh and stores the in-flight &lt;code&gt;Promise&lt;/code&gt;. Every other caller &lt;code&gt;await&lt;/code&gt;s that &lt;em&gt;same&lt;/em&gt; promise instead of starting its own. JavaScript's single-threaded event loop makes "check the flag, set the promise" atomic — no lock needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;inflight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;            &lt;span class="c1"&gt;// the single shared refresh promise (null when idle)&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;creds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;loadCreds&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;        &lt;span class="c1"&gt;// { access_token, refresh_token, expires_at }&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SKEW_MS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;         &lt;span class="c1"&gt;// refresh ~1 min before real expiry&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;isValid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;access_token&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;expires_at&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;SKEW_MS&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getValidToken&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;isValid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;creds&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;creds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;access_token&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;// fast path, no refresh&lt;/span&gt;

  &lt;span class="c1"&gt;// SINGLE-FLIGHT: if a refresh is already running, await THAT one.&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;inflight&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;inflight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;doRefresh&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;creds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;creds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;inflight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;          &lt;span class="c1"&gt;// clear so the next expiry can refresh&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;inflight&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;access_token&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;              &lt;span class="c1"&gt;// every concurrent caller awaits the SAME promise&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two details that are easy to get wrong, and both bite in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Clear the promise in &lt;code&gt;finally&lt;/code&gt;, not &lt;code&gt;then&lt;/code&gt;.&lt;/strong&gt; Otherwise a &lt;em&gt;failed&lt;/em&gt; refresh leaves a rejected promise wedged in &lt;code&gt;inflight&lt;/code&gt; forever, and every future call re-rejects with the stale error — a "stuck promise." &lt;code&gt;finally&lt;/code&gt; clears it on success &lt;em&gt;and&lt;/em&gt; failure so the next call retries cleanly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Store the promise &lt;em&gt;before&lt;/em&gt; the first &lt;code&gt;await&lt;/code&gt;.&lt;/strong&gt; Assign &lt;code&gt;inflight&lt;/code&gt; synchronously, so a second caller arriving on the next microtask actually sees it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With 50 callers hitting an expired token, exactly &lt;strong&gt;one&lt;/strong&gt; refresh runs and the other 49 await it. If your token lives in one process — a server, a single worker, a browser tab — single-flight plus rotation-merge (below) is the &lt;em&gt;complete&lt;/em&gt; fix. You do not need a lock file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2 — Cross-process lock (many processes share one credential)
&lt;/h3&gt;

&lt;p&gt;Here's the part people miss. An in-process lock — a shared promise, an async mutex, a library's internal lock — coalesces refreshes &lt;em&gt;within one event loop&lt;/em&gt;. Two separate processes each have their own memory and their own &lt;code&gt;inflight&lt;/code&gt; variable. &lt;strong&gt;They cannot see each other's in-flight refresh.&lt;/strong&gt; Two CLIs, two workers, two containers, or two agents reading the same credential file are right back in the race; single-flight did nothing for them.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Your topology&lt;/th&gt;
&lt;th&gt;What you need&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;One process, concurrent calls / request fan-out&lt;/td&gt;
&lt;td&gt;In-process single-flight (+ rotation-merge)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One token file shared by multiple CLIs / workers / agents&lt;/td&gt;
&lt;td&gt;Single-flight &lt;em&gt;per process&lt;/em&gt; + a cross-process lock + re-read + atomic write&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Many machines sharing one credential&lt;/td&gt;
&lt;td&gt;A distributed lock (Redis/DB) or a token-broker service&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For the multi-process case you need three things &lt;em&gt;together&lt;/em&gt;: an exclusive lock, a &lt;strong&gt;re-read after acquiring it&lt;/strong&gt;, and an &lt;strong&gt;atomic write&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getValidTokenMultiProcess&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;creds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;readToken&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;isValid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;creds&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;creds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;access_token&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;      &lt;span class="c1"&gt;// fast path, no lock&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;withTokenLock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;                  &lt;span class="c1"&gt;// O_EXCL lock file: one process wins&lt;/span&gt;
    &lt;span class="nx"&gt;creds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;readToken&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;                         &lt;span class="c1"&gt;// *** RE-READ inside the lock ***&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;isValid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;creds&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;creds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;access_token&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;// a sibling already refreshed -&amp;gt; done&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mergeRotation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;creds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;doRefresh&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;creds&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;writeTokenAtomic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                      &lt;span class="c1"&gt;// temp file + rename (atomic swap)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;access_token&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;re-read after acquiring the lock&lt;/strong&gt; is the step everyone forgets — and it's the whole point. By the time you &lt;em&gt;get&lt;/em&gt; the lock, the process that held it before you may have already refreshed. If you blindly refresh anyway, you send a just-rotated token and reproduce the exact &lt;code&gt;invalid_grant&lt;/code&gt; you were trying to avoid, only now serialized. Re-read, and if it's already fresh, &lt;em&gt;use it and skip the refresh entirely&lt;/em&gt;. That converts "two refreshes serialized" (still burns the rotated token on the second) into "one refresh + one cache hit."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Order matters:&lt;/strong&gt; lock → &lt;em&gt;re-read&lt;/em&gt; → refresh only if still stale → &lt;em&gt;atomic write&lt;/em&gt; → release. Drop the re-read and the lock just serializes the same bug. Drop the atomic write and you trade the network race for a file-corruption race.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The other &lt;code&gt;invalid_grant&lt;/code&gt;: rotation-merge
&lt;/h2&gt;

&lt;p&gt;Independent of locking, &lt;em&gt;how you persist the refresh response&lt;/em&gt; is its own source of &lt;code&gt;invalid_grant&lt;/code&gt;. Providers disagree on what they return:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rotating providers&lt;/strong&gt; (Okta, Auth0, Microsoft, Salesforce) return a &lt;em&gt;new&lt;/em&gt; &lt;code&gt;refresh_token&lt;/code&gt; every refresh — save it, or your &lt;em&gt;next&lt;/em&gt; refresh uses a revoked token.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google&lt;/strong&gt; returns a &lt;code&gt;refresh_token&lt;/code&gt; only on the &lt;em&gt;first&lt;/em&gt; authorization; refresh responses omit it. If you overwrite stored credentials with the response as-is, you erase the refresh token and force a full re-consent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One rule handles both — &lt;strong&gt;rotation-merge&lt;/strong&gt;: if the response carries a &lt;code&gt;refresh_token&lt;/code&gt;, use it; if it doesn't, keep the previous one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;mergeRotation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;merged&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;refresh_token&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;refresh_token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;refresh_token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;refresh_token&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;        &lt;span class="c1"&gt;// Google omitted it -&amp;gt; keep the old one&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;expires_in&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;expires_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;expires_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;expires_in&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Naive overwrite silently works for rotating providers and silently breaks Google. Naive "always keep the old one" silently works for Google and silently breaks rotation. Merge is the only rule correct for both.&lt;/p&gt;

&lt;h2&gt;
  
  
  One more: re-read before failing
&lt;/h2&gt;

&lt;p&gt;Even with single-flight, a race can slip through across processes or at a deploy boundary. So make &lt;code&gt;invalid_grant&lt;/code&gt; handling self-healing — before you surface it as "log in again," re-read the stored token &lt;em&gt;once&lt;/em&gt;; a sibling may have just refreshed it. Recover silently if so; reserve the disruptive re-login for when the grant is &lt;em&gt;genuinely&lt;/em&gt; gone (user revoked, password changed, idle-expired).&lt;/p&gt;

&lt;h2&gt;
  
  
  The checklist
&lt;/h2&gt;

&lt;p&gt;In order of leverage (1–3 fix the single-process case, which is most reports; 4–6 add the multi-process case):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Refresh &lt;strong&gt;proactively&lt;/strong&gt; with a skew (30–60s before expiry) so callers don't all hit the cliff at once.&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Single-flight in-process&lt;/strong&gt; — one shared in-flight &lt;code&gt;Promise&lt;/code&gt;; everyone awaits it; cleared in &lt;code&gt;finally&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;[ ] If a credential is shared across processes, take an &lt;strong&gt;exclusive lock&lt;/strong&gt; (lock file / &lt;code&gt;O_EXCL&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Re-read after acquiring the lock&lt;/strong&gt; and short-circuit if a sibling already rotated.&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Persist atomically&lt;/strong&gt; — temp file + &lt;code&gt;rename&lt;/code&gt;, mode &lt;code&gt;0600&lt;/code&gt;; never write the token file in place.&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Rotation-merge&lt;/strong&gt; on persist; keep the previous &lt;code&gt;refresh_token&lt;/code&gt; when the response omits one.&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Re-read before failing&lt;/strong&gt; on &lt;code&gt;invalid_grant&lt;/code&gt;; only re-auth when the grant is genuinely gone.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  If you'd rather not re-derive it
&lt;/h2&gt;

&lt;p&gt;The patterns above are small and the code is complete enough to copy — that's deliberate; this is a build-it-yourself-friendly post. If you'd rather pull in a primitive, &lt;a href="https://github.com/wartzar-bee/refresh-guard" rel="noopener noreferrer"&gt;&lt;code&gt;refresh-guard&lt;/code&gt;&lt;/a&gt; is a small, MIT, &lt;strong&gt;zero-dependency&lt;/strong&gt; library that packages the &lt;strong&gt;in-process single-flight + correct rotation-merge + atomic file persistence&lt;/strong&gt; as one installable thing, with a typed provider-quirks table for the gotchas above.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createTokenManager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fileStore&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;refresh-guard&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createTokenManager&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                &lt;span class="c1"&gt;// optional: picks a quirks profile&lt;/span&gt;
  &lt;span class="na"&gt;store&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;fileStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;~/.myapp/creds.json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;           &lt;span class="c1"&gt;// atomic temp-file + rename persistence&lt;/span&gt;
  &lt;span class="na"&gt;refresh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;TOKEN_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;form&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;refresh_token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;                           &lt;span class="c1"&gt;// { access_token, expires_in, refresh_token? }&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Call from anywhere, as often as you like — exactly ONE refresh happens:&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;accessToken&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getValidToken&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Honest scope:&lt;/strong&gt; it solves the &lt;em&gt;in-process&lt;/em&gt; case (single-flight) plus rotation-merge and atomic persistence. It does &lt;strong&gt;not&lt;/strong&gt; ship a cross-process lock — if you share one credential across processes, you still layer the lock-file pattern from Layer 2 around it. (Disclosure: I maintain it, and I wrote the vendor-neutral guide it's based on. The patterns work with any OAuth client, or none.)&lt;/p&gt;

&lt;p&gt;Full guide with the complete cross-process lock implementation, the provider quirks table, and an FAQ: &lt;strong&gt;&lt;a href="https://refresh-guard-guide.pages.dev/" rel="noopener noreferrer"&gt;https://refresh-guard-guide.pages.dev/&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;invalid_grant&lt;/code&gt; under load almost never means "the user is logged out." It means two requests refreshed the same single-use token at once. Make exactly one refresh happen — single-flight inside a process, a re-read-after-lock across processes — merge rotation correctly, and re-read before you ever force a re-login. That's the whole fix.&lt;/p&gt;

</description>
      <category>oauth</category>
      <category>security</category>
      <category>node</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Where your Claude Code bill actually goes — I measured 66 of my own sessions</title>
      <dc:creator>wartzar-bee</dc:creator>
      <pubDate>Fri, 29 May 2026 16:59:51 +0000</pubDate>
      <link>https://dev.to/wartzarbee/where-your-claude-code-bill-actually-goes-i-measured-66-of-my-own-sessions-471e</link>
      <guid>https://dev.to/wartzarbee/where-your-claude-code-bill-actually-goes-i-measured-66-of-my-own-sessions-471e</guid>
      <description>&lt;p&gt;I kept getting surprised by my Claude Code bill. Not "shocked" — surprised. A short refactor would cost about what I expected, and then some long debugging session would quietly cost ten times more, and I couldn't have told you &lt;em&gt;why&lt;/em&gt; from the dashboard. Totals don't explain themselves.&lt;/p&gt;

&lt;p&gt;So I did the boring thing: I parsed my own logs. Claude Code writes a JSONL transcript for every session under &lt;code&gt;~/.claude/projects/&lt;/code&gt;, and every model turn in there records its token counts — input, output, and crucially the &lt;strong&gt;cache-read&lt;/strong&gt; and &lt;strong&gt;cache-write&lt;/strong&gt; counts. Multiply those by the published prices and you get a per-turn, per-session cost attribution. I ran it across 66 of my real sessions (filtered to ones that actually cost something: &lt;code&gt;cost &amp;gt; 0&lt;/code&gt; and at least 3 model turns).&lt;/p&gt;

&lt;p&gt;Here's what I found, and it's more interesting than "AI is expensive."&lt;/p&gt;

&lt;h2&gt;
  
  
  The one number everyone quotes is two different numbers
&lt;/h2&gt;

&lt;p&gt;If you've read threads about Claude Code cost, you've seen the claim that "most of your spend is re-sent context." That's true — but &lt;em&gt;how&lt;/em&gt; true depends entirely on whether you weight by &lt;strong&gt;session&lt;/strong&gt; or by &lt;strong&gt;dollar&lt;/strong&gt;, and almost nobody says which they mean.&lt;/p&gt;

&lt;p&gt;In my data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The median session re-sends only ~24% of its spend as cached context.&lt;/strong&gt; Most sessions are short — around 29 model turns, peaking near 45k tokens of context. In a short session, the context hasn't been re-sent that many times yet, so the model's actual &lt;em&gt;output&lt;/em&gt; and the &lt;em&gt;newly-written&lt;/em&gt; context are a bigger relative slice. Re-sent context is a minority.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pooled across all 66 sessions, re-sent context is 60% of total dollars.&lt;/strong&gt; When you weight by dollar, a handful of long, long-context sessions dominate the total — and in &lt;em&gt;those&lt;/em&gt;, the same large context gets re-sent turn after turn, so re-sent context balloons.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both numbers are real. They just answer different questions. "What does my typical session look like?" → 24%. "Where did my month's money go?" → 60%. If someone quotes one without the other, they're telling half the story.&lt;/p&gt;

&lt;p&gt;Here's the pooled split across all 66 sessions (total: &lt;strong&gt;$2,650.90&lt;/strong&gt; over &lt;strong&gt;4,339 model turns&lt;/strong&gt;):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Spend category&lt;/th&gt;
&lt;th&gt;Share&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Re-sent context (cache-read)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;60%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New cached context (cache-write)&lt;/td&gt;
&lt;td&gt;25%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output — the model actually writing&lt;/td&gt;
&lt;td&gt;15%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fresh (uncached) input&lt;/td&gt;
&lt;td&gt;~0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Only &lt;strong&gt;15%&lt;/strong&gt; of my total spend was the model writing. The overwhelming majority was &lt;em&gt;moving context back and forth&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this happens (it's mechanical, not mysterious)
&lt;/h2&gt;

&lt;p&gt;The model is stateless. It has no memory between turns. So on every single turn, the &lt;strong&gt;entire accumulated conversation context&lt;/strong&gt; — every file you've read, every tool result, every previous message — gets sent again so the model can "remember" it.&lt;/p&gt;

&lt;p&gt;Prompt caching softens the blow: that re-send is billed at the discounted cache-read rate (roughly a tenth of full input price) rather than full freight. But you still pay it &lt;em&gt;every turn&lt;/em&gt;, on the &lt;em&gt;whole&lt;/em&gt; context. So as a session grows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;context size goes up,&lt;/li&gt;
&lt;li&gt;the per-turn re-send cost goes up with it,&lt;/li&gt;
&lt;li&gt;and because you're now also taking &lt;em&gt;more&lt;/em&gt; turns in that long session, you multiply a growing per-turn cost by a growing turn count.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That compounding is why cost concentrates. In my set, the median session peaked at ~45k tokens of context, but the heaviest single session peaked at &lt;strong&gt;999,541 tokens&lt;/strong&gt; — and the average peak across all sessions was 251,371. The long sessions reach an order of magnitude higher than the typical one, and every token in that context is re-sent on every following turn. They don't cost a bit more. They cost the bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actually-useful takeaway
&lt;/h2&gt;

&lt;p&gt;The honest headline isn't "Claude Code is expensive." It's:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A few long sessions are expensive, and in those, re-sent context is the bill.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That reframing changes what you do about it. You don't need to micro-optimize your cheap, short sessions — they're already cheap and balanced. The leverage is entirely in the long-context marathons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/compact&lt;/code&gt; aggressively on long sessions.&lt;/strong&gt; It summarizes and drops the accumulated context, which directly shrinks the thing you're re-paying for every turn. The earlier you compact a long session, the more re-sends you avoid at the larger size.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Start a fresh session when the task changes.&lt;/strong&gt; Carrying a 200k-token context into an unrelated new task means you re-pay for 200k irrelevant tokens on every turn of the new work. A fresh session starts the re-send meter near zero.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch context growth, not just the running total.&lt;/strong&gt; The total tells you what already happened; the &lt;em&gt;per-turn context size&lt;/em&gt; tells you what each future turn will cost. When you see context climbing into the hundreds of thousands of tokens, that's the signal to compact or split — before, not after.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't over-index on cache efficiency.&lt;/strong&gt; My median cache efficiency was ~83% (pooled 98%), which sounds great. But cache efficiency only tells you how &lt;em&gt;cheap your re-sends are&lt;/em&gt; — not &lt;em&gt;how much you're re-sending&lt;/em&gt;. You can have 98% cache efficiency and still be torching money, because you're efficiently re-sending an enormous context a hundred times. The metric to watch is the re-sent-context &lt;em&gt;share of spend&lt;/em&gt;, not the cache hit rate.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A caveat I want to be loud about
&lt;/h2&gt;

&lt;p&gt;This is &lt;strong&gt;one heavy user's data&lt;/strong&gt; — mine, from building one set of projects. It is a &lt;em&gt;reference set&lt;/em&gt;, not a census or a representative survey of all Claude Code users. The specific numbers (the $4.08 median, the 24%/60% split) will shift for a lighter user, a different stack, or a different price sheet. I haven't trimmed, adjusted, or curated anything — these are the tool's raw output — but a single self-measured source is inherently narrow, so treat the &lt;em&gt;shape&lt;/em&gt; as the finding and verify the &lt;em&gt;numbers&lt;/em&gt; against your own logs.&lt;/p&gt;

&lt;p&gt;The structural part — cost concentrates in a few long sessions, and re-sent context dominates those — is a property of stateless models plus per-turn context re-send. That should hold broadly. The exact percentiles are just my sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you want to measure your own
&lt;/h2&gt;

&lt;p&gt;I wrote the parser as a small CLI to do this analysis, and it's open source. To see where &lt;em&gt;your&lt;/em&gt; spend goes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @wartzar-bee/tokenscope
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It reads your Claude Code logs locally — &lt;strong&gt;read-only, no network, no telemetry, nothing leaves your machine&lt;/strong&gt; — and prints the same breakdown: output vs. re-sent context vs. new context, the per-turn context-growth curve, and which percentile your session lands in against the 66-session reference set above. &lt;code&gt;--share&lt;/code&gt; emits a privacy-safe summary (aggregate numbers only — no file paths, no prompt or response content) you can paste into a thread. It's MIT, not affiliated with Anthropic, and the whole cost model is the few paragraphs above — so if you'd rather write your own parser, the logs are right there in &lt;code&gt;~/.claude/projects/&lt;/code&gt; and you now know what to look for.&lt;/p&gt;

&lt;p&gt;(Disclosure: I maintain tokenscope. I'm linking it because it's the tool I used to produce these exact numbers, not because you need it — the JSONL is yours and the math is simple.)&lt;/p&gt;

&lt;p&gt;The full dataset — every percentile table, the SVG charts, the complete methodology and a frank limitations section — is published here: &lt;strong&gt;&lt;a href="https://tokenscope.pages.dev/benchmark/" rel="noopener noreferrer"&gt;https://tokenscope.pages.dev/benchmark/&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;Don't optimize your average session; it's fine. Find your handful of long-context marathons — the ones that quietly peaked past a few hundred thousand tokens — and compact or split &lt;em&gt;those&lt;/em&gt;. That's where the 60% lives.&lt;/p&gt;

&lt;p&gt;I'd genuinely like to widen this beyond one person's logs: &lt;strong&gt;if you've measured your own Claude Code (or any agent) spend, what's &lt;em&gt;your&lt;/em&gt; split between re-sent context and actual output — and does cost concentrate in a few long sessions for you too, or is your distribution flatter?&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>llm</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
