<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alaric</title>
    <description>The latest articles on DEV Community by Alaric (@llm_link_top).</description>
    <link>https://dev.to/llm_link_top</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3930750%2Fec0813fe-3a60-4305-84b4-1c8830f63622.png</url>
      <title>DEV Community: Alaric</title>
      <link>https://dev.to/llm_link_top</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/llm_link_top"/>
    <language>en</language>
    <item>
      <title>5 Tips to Cut Claude Code Token Usage by 30%</title>
      <dc:creator>Alaric</dc:creator>
      <pubDate>Mon, 18 May 2026 02:21:43 +0000</pubDate>
      <link>https://dev.to/llm_link_top/5-tips-to-cut-claude-code-token-usage-by-30-1o15</link>
      <guid>https://dev.to/llm_link_top/5-tips-to-cut-claude-code-token-usage-by-30-1o15</guid>
      <description>&lt;p&gt;I've been using Claude Code daily for the past few months. The output quality is great, but the API bill at the end of the month was painful. After some experimentation, I found a few habits that consistently cut my token consumption by 25–35% without sacrificing code quality.&lt;/p&gt;

&lt;p&gt;Sharing them here in case anyone else is in the same boat.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Put a &lt;code&gt;CLAUDE.md&lt;/code&gt; at the project root
&lt;/h2&gt;

&lt;p&gt;Claude Code reads &lt;code&gt;CLAUDE.md&lt;/code&gt; automatically on startup and treats it as durable context. Without it, Claude has to re-discover your project structure on every session — that's a lot of file-reading tokens.&lt;/p&gt;

&lt;p&gt;A minimal template that works for me:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Project: &amp;lt;name&amp;gt;&lt;/span&gt;

&lt;span class="gu"&gt;## Stack&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Language: Go 1.22 / TypeScript 5
&lt;span class="p"&gt;-&lt;/span&gt; Framework: Gin / React 19
&lt;span class="p"&gt;-&lt;/span&gt; DB: PostgreSQL via GORM

&lt;span class="gu"&gt;## Layout&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`controller/`&lt;/span&gt; — HTTP handlers
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`service/`&lt;/span&gt;    — business logic
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`model/`&lt;/span&gt;      — DB models

&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Use &lt;span class="sb"&gt;`common.Marshal`&lt;/span&gt; instead of &lt;span class="sb"&gt;`encoding/json`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; All new code must compile under &lt;span class="sb"&gt;`go vet`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep it under 200 lines. Anything longer and Claude will start spending tokens summarizing the file itself.&lt;br&gt;
 cost, then reads are ~10% of normal price.&lt;/p&gt;

&lt;p&gt;What this means in practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don't change &lt;code&gt;CLAUDE.md&lt;/code&gt; mid-session — it invalidates the cache&lt;/li&gt;
&lt;li&gt;When asking follow-up questions, append rather than rewrite the prompt&lt;/li&gt;
&lt;li&gt;For long files, paste once and refer back to "the file above" instead of re-pasting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a 200K-token project context, my cache hit rate is around 70%, which roughly cuts input cost from $0.60 to $0.18 per session.&lt;/p&gt;


&lt;h2&gt;
  
  
  4. Prefer &lt;code&gt;Read&lt;/code&gt; tool over pasting code into the prompt
&lt;/h2&gt;

&lt;p&gt;Two ways to give Claude a file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A) "Here's the content: &amp;lt;paste 5000 lines&amp;gt;"
B) "Read src/foo.go"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both work, but (B) is cheaper because Claude only reads the file when it actually needs to. Often it'll read a 50-line slice instead of the whole file. With (A), you've already paid for all 5000 lines whether they were needed or not.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Use a smaller model for routine tasks
&lt;/h2&gt;

&lt;p&gt;You don't need Opus for "write a unit test for this function." Switch to Sonnet (or Haiku for trivial edits) when the task is mechanical:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Boilerplate generation&lt;/li&gt;
&lt;li&gt;Adding logging&lt;/li&gt;
&lt;li&gt;Renaming variables across a file&lt;/li&gt;
&lt;li&gt;Simple test cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude Code lets you swap models per session. For me, Sonnet handles ~70% of daily edits, and I save Opus for hard reasoning (architecture decisions, bug investigation, complex refactors).&lt;/p&gt;

&lt;p&gt;Rough cost difference per million output tokens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Opus 4.7: $75&lt;/li&gt;
&lt;li&gt;Sonnet 4.5: $15&lt;/li&gt;
&lt;li&gt;Haiku 4: $5&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A 5x–15x saving on the 70% of routine work adds up fast.&lt;/p&gt;




&lt;h2&gt;
  
  
  What didn't work for me
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"Compress" the prompt manually&lt;/strong&gt; — too much effort, marginal savings, and Claude often misses context you compressed away&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Using ultra-cheap third-party "Opus" relays&lt;/strong&gt; — twice I found out the model was actually a Chinese open-source model in disguise. Quality dropped immediately. If price looks 5x too good to be true, it usually is&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disabling Prompt Caching to "stay on the latest context"&lt;/strong&gt; — caching is invalidated automatically when context changes, so you don't gain anything by disabling it&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Habit&lt;/th&gt;
&lt;th&gt;Saving&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;CLAUDE.md&lt;/code&gt; at project root&lt;/td&gt;
&lt;td&gt;20–30% on first messages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One task per session + &lt;code&gt;/clear&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;10–20% on long sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Use Prompt Caching&lt;/td&gt;
&lt;td&gt;30–60% on follow-ups&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Use &lt;code&gt;Read&lt;/code&gt; tool, don't paste&lt;/td&gt;
&lt;td&gt;10–30% on file-heavy tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sonnet/Haiku for routine work&lt;/td&gt;
&lt;td&gt;5–15× on those tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Combined effect on my monthly bill: roughly a third of what I was paying before.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you have other tips that worked for you, I'd love to hear them in the comments.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  2. Scope each session to a single task
&lt;/h2&gt;

&lt;p&gt;If you start a session with "let's refactor the auth layer and also add OAuth and also fix the rate limiter," Claude keeps all three goals in context the whole time. Every subsequent message pays for that context.&lt;/p&gt;

&lt;p&gt;Habit: &lt;strong&gt;one task per session, then &lt;code&gt;/clear&lt;/code&gt;&lt;/strong&gt;. You'll see token usage drop noticeably on the second message of each session.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Use Prompt Caching aggressively
&lt;/h2&gt;

&lt;p&gt;Both Anthropic's API and most relays support prompt caching. The way it works: stable prefixes (system prompt, file contents, project context) are cached for 5 minutes at a small one-time write&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>tip</category>
    </item>
  </channel>
</rss>
