Javier Castillo

Posted on Apr 23

I built a new file format to cut AI token costs by 70% — here's how it works

#ai #data #llm #performance

Tags: #ai #webdev #javascript #opensource #ChatGPT #Claude #Copilot #Gemini

Every time I pasted a spreadsheet or XML file into Claude or ChatGPT, something bothered me.

I was burning tokens on things the AI didn't need.

A 120-row sales CSV with 10 columns repeats the column names 120 times. An XML response from an enterprise API can spend 40% of its tokens on namespace declarations like xmlns:ns0="http://schemas.example.com/data" before a single byte of real data appears. A JSON array of 200 objects repeats every key 200 times.

The AI doesn't need any of that. It needs the data.

So I built TokenPinch — a tool that converts files to a compact .pinch format before you paste them into any AI. And in the process, I ended up designing what I think could be a useful open format for token-efficient data exchange with LLMs.

The problem in numbers

Take a typical sales CSV with verbose headers:

transaction_date,customer_identifier,product_category,product_name,unit_price,quantity_ordered,discount_percentage,revenue_total,warehouse_location,sales_representative
2024-01-01,CUST1000,Electronics,Laptop,999.99,2,0,1999.98,Miami-FL,Maria Lopez
2024-01-02,CUST1001,Apparel,Jacket,59.99,1,10,53.99,Dallas-TX,Carlos Rivera
...120 more rows

That's approximately 3,400 tokens. The column headers alone — repeated 120 times — account for about 30% of that.

Now look at the same data in .pinch format:

[AIX-FORMAT v1.0]
Source: sales_data.csv | Rows: 120 | Strategy: TOON tabular + header aliasing
Decode: expand [ALIASES] to reconstruct column names, then parse [DATA] rows in header order.
Generated-by: TokenPinch (tokenpinch.com)
[/AIX-FORMAT]

[ALIASES]
transaction_date:td | customer_identifier:ci | product_category:pc | product_name:pn | unit_price:up | quantity_ordered:qo | discount_percentage:dp | revenue_total:rt | warehouse_location:wl | sales_representative:sr
[/ALIASES]

[DATA]
sales_data[120]{td,ci,pc,pn,up,qo,dp,rt,wl,sr}:
2024-01-01,CUST1000,Electronics,Laptop,999.99,2,0,1999.98,Miami-FL,Maria Lopez
2024-01-02,CUST1001,Apparel,Jacket,59.99,1,10,53.99,Dallas-TX,Carlos Rivera
...
[/DATA]

That's approximately 900 tokens. Same data, 74% fewer tokens.

At GPT-4o prices ($2.50/1M tokens), that's $0.006 saved per file. If you're processing 1,000 files a month, that's $6 back. If you're building an application that feeds files to an LLM at scale, the savings compound fast.

How `.pinch` works

The format has three compression strategies, applied automatically based on the source file type.

Strategy 1: TOON tabular + header aliasing (CSV, Excel, JSON arrays)

Inspired by TOON (Token-Oriented Object Notation), this strategy:

Declares column headers once at the top with short aliases
Uses aliases in all data rows instead of full names
Wraps everything in a self-describing block the LLM can decode

Alias generation follows a priority chain:

First, try initials: transaction_date → td, customer_full_name → cfn
Then, first 2 chars of word 1 + first char of word 2: product_name → prn
Then, first 2 characters: status → st
Fallback: letter + number: a1, a2

All aliases are guaranteed unique within a file.

Strategy 2: XML namespace stripping (XML, SOAP)

Enterprise XML is brutal for token consumption. A SOAP response can have 8 namespace declarations adding hundreds of tokens before the actual data starts.

.pinch strips:

<?xml ?> declarations
All xmlns:* attribute declarations
Namespace prefixes from element names: <ns0:Response> → <Response>
Namespace prefixes from attributes: ns1:ID="X" → ID="X"
Empty elements and redundant whitespace

The result is clean, standard XML with all content and attributes preserved.

Real example: I tested this with a large enterprise XML API response with 8 namespace declarations. The .pinch version was 60% smaller, and when uploaded to ChatGPT as a file attachment, it interpreted it perfectly — identifying the data structure, fields, and values without any special instructions.

Strategy 3: Text normalization (Word, PDF)

For .docx and .pdf files, the strategy is simpler:

Extract plain text (ignoring images and formatting)
Normalize whitespace: collapse multiple spaces, remove trailing spaces, max 2 consecutive blank lines
For PDFs: prefix each page with [Page N]

This typically saves 15–30% on documents with irregular whitespace — which is almost every PDF converted from a scanned or formatted source.

The self-describing header

The most important design decision in .pinch is the [AIX-FORMAT] header.

Every .pinch file starts with a block that tells any LLM exactly how to decode it:

[AIX-FORMAT v1.0]
Source: filename.ext | Strategy: <strategy used>
Decode: <plain English instruction>
Generated-by: TokenPinch (tokenpinch.com)
[/AIX-FORMAT]

This means .pinch works without any system prompt changes, without native platform support, and without any configuration. You paste it or upload it, and the AI reads the instructions and works with the data normally.

Claude, ChatGPT, Gemini, and Copilot have all handled it correctly in testing.

The no-compression rule

One thing I learned building this: not every file benefits from compression.

If the compressed output would consume more tokens than the original, TokenPinch skips the conversion and tells the user to paste the original file directly. This happens with very small files where the format overhead outweighs the savings, or with files that already have short column names and short values.

Honesty about when the tool doesn't help is part of the value.

Implementation

TokenPinch is a single optimize webapp using followin stack:

SheetJS for Excel parsing
mammoth.js for Word document text extraction
pdf.js for PDF text extraction
Vanilla JS for everything else

The entire tool is ~900 lines of HTML/CSS/JS.

The `.pinch` spec is open

I've published the format specification on GitHub. The spec covers:

Complete file structure and block syntax
All three compression strategies with examples
Alias generation algorithm
Decoder instructions for LLMs
A reference Python implementation

The goal is for .pinch to become a standard interchange format for feeding structured data to LLMs — something any tool or pipeline can implement, regardless of language or platform.

If you're building something that sends files to LLMs and want to reduce token costs, you can implement the format from the spec without using the TokenPinch tool at all.

What's next

Developer API — to integrate .pinch compression into pipelines programmatically
More formats — YAML, SQL dumps, Parquet
Prompt templates — pre-built prompts for common tasks (summarize, find anomalies, convert to JSON)
Compression strategic improvements — look at the compression algorithms errors reported and improve the current beta version.

Try it

Tool: tokenpinch.com — free, no signup, runs entirely in your browser

Spec: github.com/javiercast/pinch-format

If you work with structured data and LLMs regularly, I'd love to hear whether this solves a real problem for you — or what's missing.

Built over a few sessions with Claude's help. Yes, I used an AI to build a tool for using AI more efficiently. The irony is not lost on me.

DEV Community

I built a new file format to cut AI token costs by 70% — here's how it works

The problem in numbers

How `.pinch` works

Strategy 1: TOON tabular + header aliasing (CSV, Excel, JSON arrays)

Strategy 2: XML namespace stripping (XML, SOAP)

Strategy 3: Text normalization (Word, PDF)

The self-describing header

The no-compression rule

Implementation

The `.pinch` spec is open

What's next

Try it

Top comments (0)

The problem in numbers

How .pinch works

Strategy 1: TOON tabular + header aliasing (CSV, Excel, JSON arrays)

Strategy 2: XML namespace stripping (XML, SOAP)

Strategy 3: Text normalization (Word, PDF)

The self-describing header

The no-compression rule

Implementation

The .pinch spec is open

What's next

Try it

How `.pinch` works

The `.pinch` spec is open