Flat $20 AI plans are disappearing. Token billing is here. These patterns show how to keep your costs low without slowing down.
1. Trim Context Instead of Sending Entire Files
Most developers paste entire files or components. That explodes token usage.
Before
// sending full file (500+ lines)
import React from "react";
export default function Dashboard() {
// 500 lines of UI, hooks, logic...
}
After
// extract only relevant function
function calculateTotals(items) {
return items.reduce((acc, item) => {
acc.total += item.price;
return acc;
}, { total: 0 });
}
Reducing context from 500 lines to 30 cuts token usage by 80 to 90 percent. The model does not need your whole app to fix one function.
2. Replace Conversations with Single-Shot Prompts
Long chat threads silently burn tokens. Each message re-sends previous context.
Before
Message 1: Here is my code...
Message 2: That did not work, try again...
Message 3: Now fix edge case...
Message 4: Also optimize performance...
After
Fix this function.
Requirements:
- handle empty array
- avoid mutation
- optimize for large datasets
Code:
function calculateTotals(items) { ... }
One structured prompt replaces 4 to 6 back and forth messages. This alone can reduce costs by 60 percent in a debugging session.
3. Use Smaller Models for Boilerplate
Not every task needs a top tier reasoning model.
Before
// using expensive model for simple code
Generate a basic Express route with validation
After
// use cheaper model or local LLM
app.post("/users", (req, res) => {
if (!req.body.email) {
return res.status(400).json({ error: "Email required" });
}
res.send("OK");
});
Reserve expensive models for architecture or debugging. Use smaller models for CRUD, validation, and tests. Teams doing this cut AI spend by 40 to 50 percent.
4. Cache AI Responses Like API Results
Most teams recompute the same answers repeatedly.
Before
// every run calls AI
const response = await ai.generate("Generate test cases for login");
After
const cache = new Map();
async function getCachedAI(prompt) {
if (cache.has(prompt)) return cache.get(prompt);
const result = await ai.generate(prompt);
cache.set(prompt, result);
return result;
}
Caching avoids duplicate token usage. For repeated tasks like test generation, this can drop costs close to zero after first execution.
This becomes critical when combined with patterns from the AI productivity tradeoffs most developers ignore, where overuse of AI actually slows teams down.
5. Chunk Large Tasks Instead of One Massive Prompt
Sending large codebases in one request is expensive and inefficient.
Before
Refactor this entire 2000-line module for performance
After
Step 1: Optimize data fetching layer
Step 2: Refactor state management
Step 3: Improve rendering performance
Breaking work into steps reduces context size and improves answer quality. Each request is cheaper and more precise. In practice, this reduces token waste by 30 to 60 percent.
6. Move Repetitive Work to Local Models
You do not need cloud AI for everything.
Before
// every lint fix goes through API
await ai.generate("Fix lint errors in this file");
After
# local model or tool
eslint --fix src/
Or with a local LLM:
ollama run codellama "Fix lint issues in this file"
Local execution has zero marginal cost. Use cloud models only for tasks that actually require reasoning.
Closing
Start by measuring your current token usage. Then apply just two patterns: context trimming and structured prompts. You will see immediate savings. The teams that treat AI like a metered resource, not a free assistant, will keep their speed while everyone else hits limits.
Top comments (0)