DEV Community

Nate Voss
Nate Voss

Posted on

3 Things I Learned About Holding Context Through Long Debugging Sessions

You're four turns into debugging something with an LLM. The model just asked a clarifying question you answered two exchanges ago. You paste the error trace again. You repeat the code snippet. You restate the prior fix you attempted.

The model has continuity. Your token bill does not.

This is the cost of context collapse. And it's everywhere in production code.

Thing 1: Context is expensive, and repetition is the silent killer

Every time you re-paste the error trace, the stack trace, the code snippet, the prior fix attempt, you're paying for tokens you already paid for.

If a typical debugging session is ten turns and each turn repeats sixty percent of prior context, you're paying token costs for that sixty percent nine times. Do that ten times a day and the math starts to sting.

Here's what it looks like:

A standard debugging context (error, code, prior attempts) is about two thousand tokens. If you repeat it eight times across a session, that's sixteen thousand tokens just to restate the same problem.

Same session with optimized context reuse. Two thousand base. Three hundred per turn average for new information. Twenty-four hundred per turn maximum. Across eight turns, you're at around four thousand four hundred tokens total.

The difference is eleven thousand six hundred tokens. Doesn't sound like much per session. Compounds fast when this is your daily work.

The obvious fix: don't repeat yourself. But obvious and easy are different things. Most code I see just resubmits everything. It works. It costs.

Thing 2: Prompt caching changes how you structure the conversation

Prompt caching lets you mark parts of your prompt as stable. The system instructions, the error trace, the code snippet. You cache them once. Subsequent calls reuse that cache without repaying token cost.

The insight is not just about money. It's about how you architect the prompting workflow itself.

Instead of flattening every message with repeated context, you structure it as layers.

First call: establish the cache with stable context, get your initial response.

Subsequent calls: send only new information. The model still has full context because the cache holds the stable parts.

Here's what that looks like:

const Anthropic = require('@anthropic-ai/sdk');

const client = new Anthropic();

const systemPrompt = `You are a debugging assistant. Help diagnose and fix code issues. Be precise, suggest concrete changes, ask clarifying questions if needed.`;

const stableContext = `
## Error Context
Stack trace:
TypeError: Cannot read property 'id' of undefined
 at getUserData (./src/services/user.js:45)
 at Object.<anonymous> (./src/index.js:12)

## Code
function getUserData(user) {
 return {
 id: user.id,
 name: user.name,
 email: user.email
 };
}

## Prior Attempts
1. Added null check: if (!user) return null
2. Logged user object before line 45
3. Confirmed user is undefined when function called
`;

async function debuggingSession() {
 // Turn 1: Initial diagnosis (cache established)
 const response1 = await client.messages.create({
 model: 'claude-opus-4-7',
 max_tokens: 1024,
 system: [
 {
 type: 'text',
 text: systemPrompt,
 },
 {
 type: 'text',
 text: stableContext,
 cache_control: { type: 'ephemeral' }
 }
 ],
 messages: [
 {
 role: 'user',
 content: 'Why is user undefined here? The function is called from index.js line 12.'
 }
 ]
 });

 console.log('Turn 1:', response1.content[0].text);
 console.log('Cache created:', response1.usage.cache_creation_input_tokens);

 // Turn 2: Followup (reuses cache)
 const response2 = await client.messages.create({
 model: 'claude-opus-4-7',
 max_tokens: 1024,
 system: [
 {
 type: 'text',
 text: systemPrompt,
 },
 {
 type: 'text',
 text: stableContext,
 cache_control: { type: 'ephemeral' }
 }
 ],
 messages: [
 {
 role: 'user',
 content: 'Why is user undefined here? The function is called from index.js line 12.'
 },
 {
 role: 'assistant',
 content: response1.content[0].text
 },
 {
 role: 'user',
 content: 'I checked the caller. It\'s passing an empty object. Should I validate the shape?'
 }
 ]
 });

 console.log('Turn 2:', response2.content[0].text);
 console.log('Cache read from:', response2.usage.cache_read_input_tokens);
 console.log('New input tokens:', response2.usage.input_tokens);
}

debuggingSession();
Enter fullscreen mode Exit fullscreen mode

Turn 1 pays full price. Look at the usage object. cache_creation_input_tokens shows what got cached.

Turn 2 reuses it. cache_read_input_tokens shows tokens pulled from the cache. Those cost about ninety percent less than new input tokens.

You're holding the context without repeating it.

Thing 3: The pit: assuming the cache survives beyond the session

This is the mistake I see most. Developers set up caching and assume it persists. It doesn't.

Cache is scoped to the session. Close the client connection, the cache is gone. Next time you run the script, cold cache. Full price.

The mental model matters. Caching is for single-session optimization. If you need context to travel across sessions (the user's project state, the error they're debugging this week), that goes in your app's state or database, not the LLM cache.

Another trap: stale cache. If the code changed or the error trace got updated, the cache doesn't know. You're debugging the wrong code. If stable context is external or user-provided, include a hash or timestamp. When it changes, cache misses. Rebuild.

The guardrail is simple. Manual invalidation if anything in the stable context is live data.


Multi-step debugging sessions are where this pays off. Structure it once. Run turns. Let the cache expire naturally when the session ends.

You're holding the thread. The model has continuity. Your costs don't multiply.

Top comments (0)