Maverick Y

Posted on Jun 27 • Edited on Jul 18

Cut LLM prompt tokens on structured data — losslessly

#llm #ai #javascript #opensource

Cut LLM prompt tokens on structured data — losslessly

A small, dependency-free tool for shrinking logs, JSON, and CSV in prompts — without dropping a single byte.

Logs, JSON, and CSV are some of the bulkiest, most repetitive things we feed into LLMs. They're also where prompt-token costs quietly pile up.

The trouble with lossy compression

The usual fix is semantic compression: have a model summarize the input and drop "low-information" tokens. It works — until the question needs the data that got dropped.

Ask:

"How many errors are in this log?"
"What's the total across these 400 rows?"

…and a lossy compressor can hand back a confident, wrong answer — because the rows it discarded were exactly the ones you needed. The compression looks great. The answer is broken.

A different bet: lossless or no-op

ctxfold takes the opposite approach. Its single rule:

Lossless or no-op. Never lossy.

Instead of summarizing, it re-encodes structure. Logs, JSON arrays, and CSV are tables in disguise — the same keys, prefixes, and templates repeat on every line. ctxfold lifts those repeated parts into a one-time header and keeps only what varies per row, producing a compact, self-labeling table the model reads directly. Nothing is dropped.

The guarantee is enforced in code: every encoder ships with a decoder, and compress() verifies that decoding its output reproduces the input before returning it. If it can't, you get your original text back, untouched. It can't corrupt your data — worst case, it does nothing.

Does the model still read it?

For logs and JSON — yes. On real data, ctxfold cuts ~35–40% of tokens on templated logs and JSON arrays, fully losslessly. And because the output is plain, labeled text, the model reads it as well as the raw input — in lookup tests against GPT-4o-mini, answers off the compressed form matched answers off the raw data, field for field.

(Readability is validated per format, on GPT-4o-mini; the lossless guarantee is model-independent. CSV turned out differently — see the update below.)

Update (July 2026)

When I wrote this, CSV readability was the one unvalidated cell in the benchmark table. I've since measured it with the same harness used for JSON and logs — and it failed: folded CSV scored 0/24 on GPT-4o-mini and 6–9/24 on GPT-4o, against 24/24 raw.

The root cause is structural: JSON and logs fold syntax (keys, braces, templates) and every value stays verbatim in the row. CSV has no syntax to remove, so its folding factors the data itself — and models don't reliably reconstruct values through a header at read time.

So as of v0.1.4, CSV folding is documented as pipeline-mode: fold for lossless transit, decompress() before the model reads it. For direct model reading, send CSV raw. JSON and logs remain validated direct-readable.

v0.2.0 also shipped ctxfold --profile — it shows where your prompt's characters go and what folding would save, with the same measured-claims rules. Zero-setup demo: node examples/profile-demo.js after cloning the repo.

Full story of the CSV result and the profiler: I built a readability test for my own compression format. It scored 0/24.

Try it

npm install ctxfold

const { compress } = require("ctxfold");

const { text, stats } = compress(bigLogOrJsonOrCsv);
// send `text` instead of the original
console.log(`${(stats.tokenRatio * 100).toFixed(0)}% fewer tokens, lossless: ${stats.lossless}`);

It's a pure text transform — no API calls, no model, zero dependencies — so it works with any LLM.

Not a replacement — the other half

ctxfold isn't a competitor to semantic compression; it's the complement. Summarize to extract a subset; ctxfold to shrink repetition without losing anything. It shines on structured data, not prose.

Why I built it

This started from a simple frustration: lossy prompt compressors gave impressive token savings, but on aggregate questions — counts, totals, "find this record" — the answers came back wrong, because the data needed to answer had been summarized away. Great compression, broken results. The fix wasn't a smarter summarizer; it was to stop dropping data at all. Repetitive structured text is compressible losslessly — you just have to treat it as structure instead of prose.

If you push a lot of logs, JSON, or CSV into prompts, I'd genuinely like to know what your payloads look like and whether the lossless tradeoff fits your use case. What's eating the most tokens in your prompts right now? Questions, critique, and edge cases that break it are all welcome in the comments.

Repo & docs: https://github.com/antrixy/ctxfold · npm: npm install ctxfold · MIT licensed.

Top comments (5)

Alex Shev • Jun 27

Lossless compression is underrated for LLM workflows because structured data often contains the evidence, not just filler. Summarizing logs or JSON before the question is known can delete the one field that matters. A reversible transform gives you cost savings without turning debugging into a guessing game.

Maverick Y • Jun 29

Exactly — "evidence, not filler" is the right distinction, and it's sharpest on aggregates. Here's a single line folding into a row. Raw:

2026-06-26T07:26:48.730Z ERROR [billing] reqId=1898747 token validated latency_ms=736 status=500

The scaffolding repeats identically on every line — the 2026-06-…Z timestamp shell, the [...], the reqId= / latency_ms= / status= keys — so it's lifted into a one-time header and each row keeps only what varies:

cols: time level scope reqId latency_ms status message
07:26:48.730 ERROR billing 1898747 736 500 token validated

~35–40% fewer tokens on real logs (measured with the GPT tokenizer), and because no line is dropped, the model works from the same information it would have from raw — it's not the compression making it guess. The reversibility is enforced: decode(encode(x)) has to reproduce the input byte-for-byte or it returns the original untouched, so it's lossless-or-no-op by construction rather than by hope.

Alex Shev • Jun 30

That lossless-or-no-op rule is the part I like most. It turns compression from a prompt trick into a data contract: either the model receives an equivalent representation, or the pipeline refuses the optimization.

The other useful guardrail is keeping the schema close to the compressed rows. Once the column contract drifts away from the data, the token savings can become ambiguity again.

Maverick Y • Jun 30

"Turns compression from a prompt trick into a data contract" — stealing that, it's better than my framing. Either the model gets an equivalent representation, or the pipeline refuses the optimization.
Your drift point is the sharp one. The reason it can't rot here: the schema isn't authored separately — it's derived from the rows at encode time, and compress() decodes its own output and checks it reproduces the input before returning, so a header that disagrees with its rows never ships (it no-ops instead). v0.1.3 (just published) adds a validate(payload) that re-runs that consistency check on any folded blob — catches a dropped cell, a truncated rows block, an out-of-range code. The one honest limit: it confirms a payload is sound and self-consistent, not that it faithfully matches an original it never saw. Hand-stitch a schema from one batch onto rows from another and that's outside what it can vouch for. Thanks for the nudge — good reason to ship the helper.

Alex Shev • Jun 30

That validate helper is the right boundary. It makes the optimization inspectable without pretending to prove history it never saw. I especially like the no-op fallback because it keeps compression from becoming another invisible source of model weirdness.