Kumar Kislay

Posted on May 4

How to stop hitting Claude usage limits

#ai #promptengineering #claude

This article was originally published on https://forg.to/articles/how-to-stop-hitting-claude-usage-limits

*You're Paying for Claude. You're Also Wasting Most of It.
*
I used to hit my usage limit by 2pm every day.

Not because I was doing too much work. Because I had no idea how Claude actually charges you.

Once I understood the real mechanic, everything changed. I now hit my limit maybe once a month.

Here is what nobody tells you upfront.

The one thing you need to understand first

Claude does not read just your latest message.

Every time you send a message, Claude re-reads the entire conversation from the top. Message 1. Message 2. All the way to where you are now. Every single time.

So message 1 costs almost nothing. Message 30 means Claude just re-read 29 full exchanges before even thinking about what you asked.

That is where your credits go. Not to your questions. To the history behind them.

Every habit in this article comes back to one idea: stop paying to re-read things that no longer matter.

The habits most people don't know about

Convert files before uploading them

A single PDF page costs between 1,500 and 3,000 tokens. A screenshot can cost 1,300 tokens. DOCX and PPTX files carry invisible metadata bloat on top of that.

The fix takes two minutes. Open a blank Google doc (type doc.new in your URL bar), paste only the text you actually need, download it as a .md file, and upload that instead.

If you upload the same 15-page PDF to four different chats, you just burned over 180,000 tokens on a document that could have been 2,000 tokens of clean text.

Plan in chat. Build in Cowork.

File creation, spreadsheets, decks, and documents cost more than regular chat messages.

So do not open Cowork and say "create me a financial model."

Open Chat first. Plan the structure. Agree on sections. Nail the assumptions. Once you know exactly what you want, move to Cowork and say "build this exact thing."

Thinking is cheap. Building is expensive. Do them in the right order.

Say "ask me questions" instead of writing long prompts

A 500-word prompt costs 500 tokens every time Claude re-reads the conversation. A 15-word prompt that triggers clarifying questions costs almost nothing.

The prompt I use for 80% of sessions: "I want to [task] to achieve [goal]. Ask me questions before you start."

Clicking options costs almost nothing. Typing paragraphs costs a lot. Let Claude pull the context out of you instead of you pushing walls of text at it.

Stop asking Claude to redo the whole thing

When section 3 of a report is wrong, do not say "redo the report."

Say "only redo section 3. Keep everything else."

Every full redo means Claude regenerates the entire output. If the report is 2,000 tokens, that is 2,000 output tokens burned again. Point to the specific problem. Fix only that.

Also: add "no commentary, no explanations, just the output" when you know exactly what you want. Every sentence of "Happy to help! Here is what I did..." is tokens you are paying for.

Edit your message instead of sending a follow-up

This is the one that changed my daily usage most.

In Chat, you can click Edit on your previous message, fix it, and regenerate. The old exchange gets replaced, not stacked.

Every time you type "no, I meant..." or "actually, change X to Y," you are adding to the history Claude re-reads forever. The edit button removes that entirely.

Batch your tasks into one message

Three separate prompts equal three full context reloads.

One prompt with three tasks equals one reload.

Instead of sending "summarize this," then "list the main points," then "suggest a headline" as three messages, write them all in one. The answer is usually better too. Claude sees the full picture at once.

Use the right model for the task

Opus with extended thinking is heavy machinery. Do not use heavy machinery to move a chair.

Grammar checks, brainstorming, reformatting, quick answers: Sonnet handles all of this at a fraction of the cost.

My rule: if the task takes Claude under 30 seconds to answer, it does not need Opus. Switch before you start. It takes two clicks.

Keep your context files short

If you use Cowork with a personal context file, Claude reads it before every single task. If that file is 10,000 words, that is thousands of tokens burned before any real work starts. Every session.

Keep it under 2,000 words. Cut anything that does not change how Claude writes or decides.

Restart the conversation instead of correcting

A 20-message Cowork session burns roughly 105,000 tokens. A 30-message session burns 232,000.

When a session goes sideways, do not keep correcting forward. Restart from an earlier message, or start a completely fresh session with a one-line summary of what you need.

Clean slate is almost always cheaper than digging out.

New topic, new chat. Always.

If you asked about a LinkedIn post, then a client proposal, then something else, all inside the same chat, Claude is still re-reading the LinkedIn conversation while thinking about the new topic.

Old messages are dead weight. Tokens spent on context that does nothing.

New topic equals new chat. Every time.

Use Projects for documents you reference often

If you upload the same PDF to five different chats, Claude re-reads that document five full times.

Use Projects instead. Upload once. Every conversation inside that project references it without burning fresh tokens. If you work with contracts, brand guides, or research papers you reference often, this alone can cut your usage significantly.

Turn off features you are not using

Web search, connectors, and extended thinking all add tokens to every response, even when you do not need them.

My default is everything off. I turn features on per task.

And when you do use connectors, be specific. "Search Slack from the last 7 days for messages about the Q2 launch" is far cheaper than "search Slack for anything about launches."

Spread your work across the day

Claude uses a rolling 5-hour window for usage limits. If you burn through everything in one morning session, most of your daily capacity goes unused.

Split into two or three sessions across the day. By the time you come back, your earlier usage has rolled off.

Stop using Claude for things Claude is bad at

Claude cannot generate images. If you spend five messages trying to describe a visual and getting text workarounds, you just burned tokens on a task Claude was never going to solve.

Claude is also not the fastest at real-time search. Use the right tool for the right job.

Where to start

Do not try all of this at once.

If you use Cowork daily: start with converting your files, planning in Chat before building, and stopping full redos. Those three alone will extend how long your credits last.

If you mostly use Chat: start with editing instead of correcting, keeping one topic per chat, and using Projects for recurring documents.

If you are on the base plan and keep hitting limits: batch your prompts, use cheaper models for simple tasks, and spread sessions across the day.

The goal is not to use Claude less. It is to stop paying for conversations Claude is having with itself.