Cut LLM prompt tokens on structured data — losslessly
A small, dependency-free tool for shrinking logs, JSON, and CSV in prompts — without dropping a single byte.
Logs, JSON, and CSV are some of the bulkiest, most repetitive things we feed into LLMs. They're also where prompt-token costs quietly pile up.
The trouble with lossy compression
The usual fix is semantic compression: have a model summarize the input and drop "low-information" tokens. It works — until the question needs the data that got dropped.
Ask:
"How many errors are in this log?"
"What's the total across these 400 rows?"
…and a lossy compressor can hand back a confident, wrong answer — because the rows it discarded were exactly the ones you needed. The compression looks great. The answer is broken.
A different bet: lossless or no-op
ctxfold takes the opposite approach. Its single rule:
Lossless or no-op. Never lossy.
Instead of summarizing, it re-encodes structure. Logs, JSON arrays, and CSV are tables in disguise — the same keys, prefixes, and templates repeat on every line. ctxfold lifts those repeated parts into a one-time header and keeps only what varies per row, producing a compact, self-labeling table the model reads directly. Nothing is dropped.
The guarantee is enforced in code: every encoder ships with a decoder, and compress() verifies that decoding its output reproduces the input before returning it. If it can't, you get your original text back, untouched. It can't corrupt your data — worst case, it does nothing.
Does the model still read it?
Yes. On real data, ctxfold cuts ~35–40% of tokens on templated logs and JSON arrays, fully losslessly. And because the output is plain, labeled text, the model reads it as well as the raw input — in lookup tests against GPT-4o-mini, answers off the compressed form matched answers off the raw data, field for field.
(Readability is validated on GPT-4o-mini; the lossless guarantee is model-independent.)
Try it
npm install ctxfold
const { compress } = require("ctxfold");
const { text, stats } = compress(bigLogOrJsonOrCsv);
// send `text` instead of the original
console.log(`${(stats.tokenRatio * 100).toFixed(0)}% fewer tokens, lossless: ${stats.lossless}`);
It's a pure text transform — no API calls, no model, zero dependencies — so it works with any LLM.
Not a replacement — the other half
ctxfold isn't a competitor to semantic compression; it's the complement. Summarize to extract a subset; ctxfold to shrink repetition without losing anything. It shines on structured data, not prose.
Why I built it
This started from a simple frustration: lossy prompt compressors gave impressive token savings, but on aggregate questions — counts, totals, "find this record" — the answers came back wrong, because the data needed to answer had been summarized away. Great compression, broken results. The fix wasn't a smarter summarizer; it was to stop dropping data at all. Repetitive structured text is compressible losslessly — you just have to treat it as structure instead of prose.
If you push a lot of logs, JSON, or CSV into prompts, I'd genuinely like to know what your payloads look like and whether the lossless tradeoff fits your use case. What's eating the most tokens in your prompts right now? Questions, critique, and edge cases that break it are all welcome in the comments.
Repo & docs: https://github.com/antrixy/ctxfold · npm: npm install ctxfold · MIT licensed.
Top comments (5)
Lossless compression is underrated for LLM workflows because structured data often contains the evidence, not just filler. Summarizing logs or JSON before the question is known can delete the one field that matters. A reversible transform gives you cost savings without turning debugging into a guessing game.
Exactly — "evidence, not filler" is the right distinction, and it's sharpest on aggregates. Here's a single line folding into a row. Raw:
The scaffolding repeats identically on every line — the 2026-06-…Z timestamp shell, the [...], the reqId= / latency_ms= / status= keys — so it's lifted into a one-time header and each row keeps only what varies:
~35–40% fewer tokens on real logs (measured with the GPT tokenizer), and because no line is dropped, the model works from the same information it would have from raw — it's not the compression making it guess. The reversibility is enforced: decode(encode(x)) has to reproduce the input byte-for-byte or it returns the original untouched, so it's lossless-or-no-op by construction rather than by hope.
That lossless-or-no-op rule is the part I like most. It turns compression from a prompt trick into a data contract: either the model receives an equivalent representation, or the pipeline refuses the optimization.
The other useful guardrail is keeping the schema close to the compressed rows. Once the column contract drifts away from the data, the token savings can become ambiguity again.
"Turns compression from a prompt trick into a data contract" — stealing that, it's better than my framing. Either the model gets an equivalent representation, or the pipeline refuses the optimization.
Your drift point is the sharp one. The reason it can't rot here: the schema isn't authored separately — it's derived from the rows at encode time, and compress() decodes its own output and checks it reproduces the input before returning, so a header that disagrees with its rows never ships (it no-ops instead). v0.1.3 (just published) adds a validate(payload) that re-runs that consistency check on any folded blob — catches a dropped cell, a truncated rows block, an out-of-range code. The one honest limit: it confirms a payload is sound and self-consistent, not that it faithfully matches an original it never saw. Hand-stitch a schema from one batch onto rows from another and that's outside what it can vouch for. Thanks for the nudge — good reason to ship the helper.
That validate helper is the right boundary. It makes the optimization inspectable without pretending to prove history it never saw. I especially like the no-op fallback because it keeps compression from becoming another invisible source of model weirdness.