Midhun Sekhar

Posted on Apr 8

Stop Wasting Tokens: How to Cut Your LLM Costs by 97%

#ai #webdev #productivity #api

The hidden tax in your AI pipeline

If you're building with GPT or Claude, you’ve probably done this:

Call an API
Get a big JSON response
Send the whole thing to your LLM

Seems harmless, right?

It’s not.

You’re quietly burning money on something you don’t even use.

💸 The "metadata tax"

Let’s say your API returns this:

{
  "order": {
    "id": 123,
    "user": {
      "name": "Midhun",
      "email": "midhun@email.com"
    },
    "items": [ ... 100 objects ... ],
    "metadata": { ... tons of fields ... }
  }
}

Now ask yourself:

👉 What does your LLM actually need?

Probably just this:

{
  "name": "Midhun",
  "email": "midhun@email.com"
}

🤯 Here’s the problem

LLMs don’t care what’s useful.

They charge you for everything.

Full JSON → ~1500 tokens
Useful data → ~60 tokens

👉 You’re paying ~25x more than necessary.

And this happens on every request.

🧠 “I’ll just parse it manually”

Sure… you can do this:

user = data.get("order", {}).get("user", {})
email = user.get("email")

Now imagine:

10+ fields
deeply nested structures
multiple APIs

You end up writing:

defensive null checks
brittle parsing logic
repeated boilerplate everywhere

It’s not hard… just annoying and error-prone.

⚡ The smarter approach: preprocess your data

Instead of sending raw JSON to your LLM:

👉 clean it first

Use a small extraction step to pull only what you need.

For example:

{
  "data": {...},
  "queries": {
    "email": ".order.user.email",
    "name": ".order.user.name"
  }
}

Output:

{
  "email": "midhun@email.com",
  "name": "Midhun"
}

💰 Why this matters more than you think

Let’s do rough math:

Payload	Tokens	Cost (per 1k calls)
Raw JSON	1500	~$45
Cleaned JSON	60	~$1

👉 That’s a 97% reduction

Now multiply that by:

daily requests
production scale

This is not optimization.

This is cost control.

🔧 So how do you actually do this?

You have 2 options:

Option 1: Local parsing

Use JSONPath libraries
Write custom logic
Maintain it across services

Option 2: Use a preprocessing layer

I ended up building a small tool for this after getting tired of repeating the same parsing logic.

It’s basically:

“JSON query engine as a service”

You send:

raw JSON
a query

You get:

clean, minimal payload

No setup, no dependencies.

🔗 Real-world use cases

This pattern is surprisingly useful:

🤖 AI pipelines

Reduce token usage before sending data to LLMs

🔔 Webhooks

Clean payloads from:

Stripe
Shopify
GitHub

📊 Logs & analytics

Extract only relevant fields from large datasets

⚡ A small shift, big impact

Most developers optimize:

prompts
model selection

But ignore:

the data they send

That’s where the real waste is.

🚀 Final thought

In the AI era:

Efficiency = profit

Before optimizing your prompts,

try optimizing your input.

You might be surprised how much you save.

👇 If you're curious

I turned this idea into a simple API you can try:

JSON PowerExtract (available on RapidAPI)

I included a Free Tier (500 requests/month) so you can test the token savings in your own pipeline today.

DEV Community