Matt Lewandowski

Posted on Jan 8

Optimizing JSON for LLMs

#programming #ai #webdev #beginners

If you're building AI-powered features, you're probably sending a lot of JSON to language models. And if you're sending a lot of JSON, you're probably burning through tokens faster than you'd like.

At Kollabe, we use AI to generate summaries, action items, and suggestions for retrospectives and standups. When you have dozens of team members submitting updates daily, the JSON payloads get large quickly. We needed ways to shrink our data without losing information.

There are newer formal solutions out there like TOON (Token-Oriented Object Notation) that can compress JSON for LLMs by up to 40%. It's a proper spec with benchmarks and SDKs. Worth checking out if you want a standardized approach.

But sometimes you want to stay in control. You don't want another dependency. You want to understand exactly what's being sent to the model and tweak it for your specific use case. These are the simple tricks we use to cut down token usage without adding complexity.

1. Replace Long IDs with Short Ones

UUIDs are everywhere. They're great for databases, but terrible for token efficiency.

// This UUID is 4-5 tokens
"550e8400-e29b-41d4-a716-446655440000"

// This is 1 token
"u-1"

When you're referencing the same user across hundreds of standup entries, those extra tokens add up fast.

The solution: build a simple mapping as you process your data. First user you encounter becomes u-1, second becomes u-2, and so on. If you see the same UUID again, reuse the short ID you already assigned.

// Before: UUIDs everywhere
{
  odUserId: "550e8400-e29b-41d4-a716-446655440000",
  odQuestionId: "7c9e6679-7425-40de-944b-e07fc1f90ae7",
  odAnswerId: "f47ac10b-58cc-4372-a567-0e02b2c3d479"
}

// After: short, prefixed IDs
{
  uid: "u-1",
  qid: "q-1", 
  aid: "a-1"
}

The key insight is that the same UUID always maps to the same short ID. So when the LLM sees u-1 multiple times across different answers, it understands those entries belong to the same person. Use different prefixes for different entity types so the model can distinguish between a user ID and a question ID.

2. Drop the Formatting

JSON.stringify has a second and third parameter that most people forget about. The third one adds indentation:

// Pretty printed (wasteful)
JSON.stringify(data, null, 2);

// Minified (efficient)
JSON.stringify(data);

The difference looks like this:

// Pretty: ~80 characters
{
  "name": "Alice",
  "role": "Engineer",
  "team": "Platform"
}

// Minified: ~45 characters
{"name":"Alice","role":"Engineer","team":"Platform"}

For small objects, whatever. For thousands of standup entries? That whitespace adds up. LLMs don't care about formatting anyway.

3. Use Shorter Key Names

This one feels obvious once you think about it. Compare:

// Verbose
type StandupEntry = {
  odUserId: string;
  userName: string;
  yesterdayUpdate: string;
  todayPlan: string;
  blockerDescription: string;
};

// Concise
type StandupEntry = {
  odUid: string;
  name: string;
  yesterday: string;
  today: string;
  blocker: string;
};

When you have hundreds of entries, shorter keys save real tokens. Just keep them readable enough that the LLM can understand the context.

A few rules we follow:

Drop redundant words: userId becomes id if it's clearly a user object
Use common abbreviations: desc instead of description
Keep it unambiguous: y for yesterday is too cryptic, but yest works fine

4. Remove Null and Empty Values

Don't send data that doesn't exist:

function removeEmpty<T extends object>(obj: T): Partial<T> {
  return Object.fromEntries(
    Object.entries(obj).filter(([_, v]) => {
      if (v === null || v === undefined) return false;
      if (v === "") return false;
      if (Array.isArray(v) && v.length === 0) return false;
      return true;
    })
  ) as Partial<T>;
}

// Before
{
  "name": "Alice",
  "blocker": null,
  "tags": [],
  "notes": ""
}

// After
{
  "name": "Alice"
}

If someone didn't report a blocker, why tell the LLM about it?

5. Flatten Nested Structures When Possible

Sometimes nesting is just organizational overhead:

// Before
{
  "user": {
    "profile": {
      "name": "Alice",
      "team": "Platform"
    }
  },
  "update": "Finished feature"
}

// After
{
  "name": "Alice",
  "team": "Platform",
  "update": "Finished feature"
}

The second version conveys the same information with fewer structural tokens. Obviously don't flatten things if the hierarchy carries meaning, but often it doesn't.

6. Use Arrays Instead of Repeated Objects

If you have a list of similar items, consider whether you need the full object structure for each:

// Before: 3 objects with repeated keys
{
  "entries": [
    { "name": "Alice", "status": "done" },
    { "name": "Bob", "status": "blocked" },
    { "name": "Carol", "status": "done" }
  ]
}

// After: header row + data rows
{
  "cols": ["name", "status"],
  "rows": [
    ["Alice", "done"],
    ["Bob", "blocked"],
    ["Carol", "done"]
  ]
}

This trades some readability for efficiency. For large datasets, it's worth it.

7. Strip Unnecessary Metadata

Timestamps, audit fields, and internal IDs often aren't needed for AI processing:

// Before: full database record
{
  odAnswerId: "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  odUserId: "550e8400-e29b-41d4-a716-446655440000",
  text: "Great sprint!",
  createdAt: "2024-01-15T10:30:00.000Z",
  updatedAt: "2024-01-15T10:30:00.000Z",
  isDeleted: false,
  version: 1
}

// After: just what the LLM needs
{
  uid: "u-1",
  text: "Great sprint!"
}

Ask yourself: does the model actually need this field to generate a useful response? If not, drop it.

8. Represent Booleans Efficiently

For boolean flags, consider whether you even need the field when it's false:

// Before
{ "name": "Alice", "isAdmin": false, "isActive": true, "isVerified": false }

// After: only include truthy flags
{ "name": "Alice", "active": true }

// Or use a flags array for multiple true values
{ "name": "Alice", "flags": ["active", "verified"] }

If most users aren't admins, don't include isAdmin: false on every record.

Putting It All Together

Here's a before and after from a real retrospective summary we generate at Kollabe:

Before optimization:

{
  "retrospectiveData": {
    "questions": [
      {
        "odQuestionId": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
        "questionText": "What went well this sprint?",
        "questionType": "positive"
      },
      {
        "odQuestionId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
        "questionText": "What could be improved?",
        "questionType": "negative"
      }
    ],
    "answers": [
      {
        "odAnswerId": "1b9d6bcd-bbfd-4b2d-9b5d-ab8dfbbd4bed",
        "odQuestionId": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
        "odUserId": "550e8400-e29b-41d4-a716-446655440000",
        "userName": "Alice Chen",
        "answerText": "Team collaboration was excellent during the release",
        "createdAt": "2024-01-15T10:30:00.000Z",
        "voteCount": 3
      },
      {
        "odAnswerId": "6ec0bd7f-11c0-43da-975e-2a8ad9ebae0b",
        "odQuestionId": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
        "odUserId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
        "userName": "Bob Smith",
        "answerText": "CI/CD pipeline improvements saved us hours",
        "createdAt": "2024-01-15T10:32:00.000Z",
        "voteCount": 5
      },
      {
        "odAnswerId": "3f333df6-90a4-4fda-8dd3-9485d27cee36",
        "odQuestionId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
        "odUserId": "550e8400-e29b-41d4-a716-446655440000",
        "userName": "Alice Chen",
        "answerText": "Documentation was often outdated",
        "createdAt": "2024-01-15T10:35:00.000Z",
        "voteCount": null
      }
    ]
  }
}

After optimization:

{"qs":[{"id":"q-1","text":"What went well this sprint?","type":"positive"},{"id":"q-2","text":"What could be improved?","type":"negative"}],"ans":[{"id":"a-1","qid":"q-1","uid":"u-1","name":"Alice Chen","text":"Team collaboration was excellent during the release","votes":3},{"id":"a-2","qid":"q-1","uid":"u-2","name":"Bob Smith","text":"CI/CD pipeline improvements saved us hours","votes":5},{"id":"a-3","qid":"q-2","uid":"u-1","name":"Alice Chen","text":"Documentation was often outdated"}]}

What changed:

UUIDs replaced with short IDs (q-1, a-1, u-1)
Long key names shortened (odQuestionId to qid, answerText to text)
Removed wrapper object (retrospectiveData)
Dropped null values (no voteCount: null)
Removed timestamps (not needed for summary generation)
No whitespace formatting

The optimized version is roughly 50% smaller. When you're processing retrospectives for a 50-person team with hundreds of answers, that's a meaningful reduction in token costs and faster inference times.

When to Optimize

Not every JSON payload needs this treatment. If you're sending a small config object or a single user query, the overhead of optimization isn't worth it.

But when you're building features that process large amounts of structured data, like we do with retrospective and standup summaries at Kollabe, these tricks make a real difference. They're simple to implement, don't require external dependencies, and give you immediate wins.

There's also something to be said for staying in control of your data pipeline. When you write your own optimization layer, you understand exactly what's happening. You can tweak the short ID prefixes, decide which fields to drop, and adjust the strategy as your data evolves. No black boxes.

The best part? LLMs handle optimized JSON just fine. They don't need pretty formatting or verbose key names to understand your data. They just need the information.

Oh and one last shameless plug: if you work on an agile dev team, check out my free planning poker and retrospective tool called Kollabe. We use all these tricks to power our AI summaries.

Top comments (7)

Ruan Aragão • Jan 8 • Edited

Very interesting content! I'll be using LLM in a project soon, and this will already be useful. Thank you!

Arlow • Jan 8

TOON seems interesting. Never heard of it before. I wonder how accurate the results are for complex data sets.

Kelly • Jan 8

It can also help with latency! Tokens take time to process, and at scale, the less you send, the faster the response!

Alfatech • Jan 8

This hits an important point many people overlook: fewer fields mean fewer hallucinations and lower costs. In practice, consistent field naming and limiting optional fields made the biggest difference for me. Have you measured any before/after metrics (accuracy, latency, cost)?

Matt Lewandowski • Jan 8

Agreed, limiting fields makes a huge difference in its self! We use langfuse for all of our metrics. Accuracy has stayed mostly the same, read tokens and our p99 latency has gone down! This mostly affects our larger requests, which is not the majority. Most teams using our tools are reasonably small.