6 AI Usage Patterns That Cut Your Token Costs by 70% in Real Projects

Flat $20 AI plans are disappearing. Token billing is here. These patterns show how to keep your costs low without slowing down.

1. Trim Context Instead of Sending Entire Files

Most developers paste entire files or components. That explodes token usage.

Before

// sending full file (500+ lines)
import React from "react";

export default function Dashboard() {
  // 500 lines of UI, hooks, logic...
}

After

// extract only relevant function
function calculateTotals(items) {
  return items.reduce((acc, item) => {
    acc.total += item.price;
    return acc;
  }, { total: 0 });
}

Reducing context from 500 lines to 30 cuts token usage by 80 to 90 percent. The model does not need your whole app to fix one function.

2. Replace Conversations with Single-Shot Prompts

Long chat threads silently burn tokens. Each message re-sends previous context.

Before

Message 1: Here is my code...
Message 2: That did not work, try again...
Message 3: Now fix edge case...
Message 4: Also optimize performance...

After

Fix this function.
Requirements:
- handle empty array
- avoid mutation
- optimize for large datasets

Code:
function calculateTotals(items) { ... }

One structured prompt replaces 4 to 6 back and forth messages. This alone can reduce costs by 60 percent in a debugging session.

3. Use Smaller Models for Boilerplate

Not every task needs a top tier reasoning model.

Before

// using expensive model for simple code
Generate a basic Express route with validation

After

// use cheaper model or local LLM
app.post("/users", (req, res) => {
  if (!req.body.email) {
    return res.status(400).json({ error: "Email required" });
  }
  res.send("OK");
});

Reserve expensive models for architecture or debugging. Use smaller models for CRUD, validation, and tests. Teams doing this cut AI spend by 40 to 50 percent.

4. Cache AI Responses Like API Results

Most teams recompute the same answers repeatedly.

Before

// every run calls AI
const response = await ai.generate("Generate test cases for login");

After

const cache = new Map();

async function getCachedAI(prompt) {
  if (cache.has(prompt)) return cache.get(prompt);

  const result = await ai.generate(prompt);
  cache.set(prompt, result);
  return result;
}

Caching avoids duplicate token usage. For repeated tasks like test generation, this can drop costs close to zero after first execution.

This becomes critical when combined with patterns from the AI productivity tradeoffs most developers ignore, where overuse of AI actually slows teams down.

5. Chunk Large Tasks Instead of One Massive Prompt

Sending large codebases in one request is expensive and inefficient.

Before

Refactor this entire 2000-line module for performance

After

Step 1: Optimize data fetching layer
Step 2: Refactor state management
Step 3: Improve rendering performance

Breaking work into steps reduces context size and improves answer quality. Each request is cheaper and more precise. In practice, this reduces token waste by 30 to 60 percent.

6. Move Repetitive Work to Local Models

You do not need cloud AI for everything.

Before

// every lint fix goes through API
await ai.generate("Fix lint errors in this file");

After

# local model or tool
eslint --fix src/

Or with a local LLM:

ollama run codellama "Fix lint issues in this file"

Local execution has zero marginal cost. Use cloud models only for tasks that actually require reasoning.

Closing

Start by measuring your current token usage. Then apply just two patterns: context trimming and structured prompts. You will see immediate savings. The teams that treat AI like a metered resource, not a free assistant, will keep their speed while everyone else hits limits.