<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Javier Castillo</title>
    <description>The latest articles on DEV Community by Javier Castillo (@javiercast).</description>
    <link>https://dev.to/javiercast</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3894895%2F531a3b60-9335-46ea-8cfe-57df1c6a1d84.jpg</url>
      <title>DEV Community: Javier Castillo</title>
      <link>https://dev.to/javiercast</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/javiercast"/>
    <language>en</language>
    <item>
      <title>I built a new file format to cut AI token costs by 70% — here's how it works</title>
      <dc:creator>Javier Castillo</dc:creator>
      <pubDate>Thu, 23 Apr 2026 20:44:37 +0000</pubDate>
      <link>https://dev.to/javiercast/i-built-a-new-file-format-to-cut-ai-token-costs-by-70-heres-how-it-works-4dhb</link>
      <guid>https://dev.to/javiercast/i-built-a-new-file-format-to-cut-ai-token-costs-by-70-heres-how-it-works-4dhb</guid>
      <description>&lt;p&gt;&lt;em&gt;Tags: #ai #webdev #javascript #opensource #ChatGPT #Claude #Copilot #Gemini&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Every time I pasted a spreadsheet or XML file into Claude or ChatGPT, something bothered me.&lt;/p&gt;

&lt;p&gt;I was burning tokens on things the AI didn't need.&lt;/p&gt;

&lt;p&gt;A 120-row sales CSV with 10 columns repeats the column names 120 times. An XML response from an enterprise API can spend 40% of its tokens on namespace declarations like &lt;code&gt;xmlns:ns0="http://schemas.example.com/data"&lt;/code&gt; before a single byte of real data appears. A JSON array of 200 objects repeats every key 200 times.&lt;/p&gt;

&lt;p&gt;The AI doesn't need any of that. It needs the data.&lt;/p&gt;

&lt;p&gt;So I built &lt;a href="https://tokenpinch.com" rel="noopener noreferrer"&gt;TokenPinch&lt;/a&gt; — a tool that converts files to a compact &lt;code&gt;.pinch&lt;/code&gt; format before you paste them into any AI. And in the process, I ended up designing what I think could be a useful open format for token-efficient data exchange with LLMs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvu6fkeq4fof40y5pc5zd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvu6fkeq4fof40y5pc5zd.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem in numbers
&lt;/h2&gt;

&lt;p&gt;Take a typical sales CSV with verbose headers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;transaction_date,customer_identifier,product_category,product_name,unit_price,quantity_ordered,discount_percentage,revenue_total,warehouse_location,sales_representative
2024-01-01,CUST1000,Electronics,Laptop,999.99,2,0,1999.98,Miami-FL,Maria Lopez
2024-01-02,CUST1001,Apparel,Jacket,59.99,1,10,53.99,Dallas-TX,Carlos Rivera
...120 more rows
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's approximately &lt;strong&gt;3,400 tokens&lt;/strong&gt;. The column headers alone — repeated 120 times — account for about 30% of that.&lt;/p&gt;

&lt;p&gt;Now look at the same data in &lt;code&gt;.pinch&lt;/code&gt; format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[AIX-FORMAT v1.0]
Source: sales_data.csv | Rows: 120 | Strategy: TOON tabular + header aliasing
Decode: expand [ALIASES] to reconstruct column names, then parse [DATA] rows in header order.
Generated-by: TokenPinch (tokenpinch.com)
[/AIX-FORMAT]

[ALIASES]
transaction_date:td | customer_identifier:ci | product_category:pc | product_name:pn | unit_price:up | quantity_ordered:qo | discount_percentage:dp | revenue_total:rt | warehouse_location:wl | sales_representative:sr
[/ALIASES]

[DATA]
sales_data[120]{td,ci,pc,pn,up,qo,dp,rt,wl,sr}:
2024-01-01,CUST1000,Electronics,Laptop,999.99,2,0,1999.98,Miami-FL,Maria Lopez
2024-01-02,CUST1001,Apparel,Jacket,59.99,1,10,53.99,Dallas-TX,Carlos Rivera
...
[/DATA]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's approximately &lt;strong&gt;900 tokens&lt;/strong&gt;. Same data, 74% fewer tokens.&lt;/p&gt;

&lt;p&gt;At GPT-4o prices ($2.50/1M tokens), that's $0.006 saved per file. If you're processing 1,000 files a month, that's $6 back. If you're building an application that feeds files to an LLM at scale, the savings compound fast.&lt;/p&gt;




&lt;h2&gt;
  
  
  How &lt;code&gt;.pinch&lt;/code&gt; works
&lt;/h2&gt;

&lt;p&gt;The format has three compression strategies, applied automatically based on the source file type.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategy 1: TOON tabular + header aliasing (CSV, Excel, JSON arrays)
&lt;/h3&gt;

&lt;p&gt;Inspired by TOON (Token-Oriented Object Notation), this strategy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Declares column headers &lt;strong&gt;once&lt;/strong&gt; at the top with short aliases&lt;/li&gt;
&lt;li&gt;Uses aliases in all data rows instead of full names&lt;/li&gt;
&lt;li&gt;Wraps everything in a self-describing block the LLM can decode&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Alias generation&lt;/strong&gt; follows a priority chain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, try initials: &lt;code&gt;transaction_date&lt;/code&gt; → &lt;code&gt;td&lt;/code&gt;, &lt;code&gt;customer_full_name&lt;/code&gt; → &lt;code&gt;cfn&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Then, first 2 chars of word 1 + first char of word 2: &lt;code&gt;product_name&lt;/code&gt; → &lt;code&gt;prn&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Then, first 2 characters: &lt;code&gt;status&lt;/code&gt; → &lt;code&gt;st&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Fallback: letter + number: &lt;code&gt;a1&lt;/code&gt;, &lt;code&gt;a2&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All aliases are guaranteed unique within a file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategy 2: XML namespace stripping (XML, SOAP)
&lt;/h3&gt;

&lt;p&gt;Enterprise XML is brutal for token consumption. A SOAP response can have 8 namespace declarations adding hundreds of tokens before the actual data starts.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;.pinch&lt;/code&gt; strips:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;?xml ?&amp;gt;&lt;/code&gt; declarations&lt;/li&gt;
&lt;li&gt;All &lt;code&gt;xmlns:*&lt;/code&gt; attribute declarations
&lt;/li&gt;
&lt;li&gt;Namespace prefixes from element names: &lt;code&gt;&amp;lt;ns0:Response&amp;gt;&lt;/code&gt; → &lt;code&gt;&amp;lt;Response&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Namespace prefixes from attributes: &lt;code&gt;ns1:ID="X"&lt;/code&gt; → &lt;code&gt;ID="X"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Empty elements and redundant whitespace&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is clean, standard XML with all content and attributes preserved.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real example:&lt;/strong&gt; I tested this with a large enterprise XML API response with 8 namespace declarations. The &lt;code&gt;.pinch&lt;/code&gt; version was 60% smaller, and when uploaded to ChatGPT as a file attachment, it interpreted it perfectly — identifying the data structure, fields, and values without any special instructions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategy 3: Text normalization (Word, PDF)
&lt;/h3&gt;

&lt;p&gt;For &lt;code&gt;.docx&lt;/code&gt; and &lt;code&gt;.pdf&lt;/code&gt; files, the strategy is simpler:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extract plain text (ignoring images and formatting)&lt;/li&gt;
&lt;li&gt;Normalize whitespace: collapse multiple spaces, remove trailing spaces, max 2 consecutive blank lines&lt;/li&gt;
&lt;li&gt;For PDFs: prefix each page with &lt;code&gt;[Page N]&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This typically saves 15–30% on documents with irregular whitespace — which is almost every PDF converted from a scanned or formatted source.&lt;/p&gt;




&lt;h2&gt;
  
  
  The self-describing header
&lt;/h2&gt;

&lt;p&gt;The most important design decision in &lt;code&gt;.pinch&lt;/code&gt; is the &lt;code&gt;[AIX-FORMAT]&lt;/code&gt; header.&lt;/p&gt;

&lt;p&gt;Every &lt;code&gt;.pinch&lt;/code&gt; file starts with a block that tells any LLM exactly how to decode it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[AIX-FORMAT v1.0]
Source: filename.ext | Strategy: &amp;lt;strategy used&amp;gt;
Decode: &amp;lt;plain English instruction&amp;gt;
Generated-by: TokenPinch (tokenpinch.com)
[/AIX-FORMAT]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means &lt;code&gt;.pinch&lt;/code&gt; works without any system prompt changes, without native platform support, and without any configuration. You paste it or upload it, and the AI reads the instructions and works with the data normally.&lt;/p&gt;

&lt;p&gt;Claude, ChatGPT, Gemini, and Copilot have all handled it correctly in testing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The no-compression rule
&lt;/h2&gt;

&lt;p&gt;One thing I learned building this: not every file benefits from compression.&lt;/p&gt;

&lt;p&gt;If the compressed output would consume &lt;strong&gt;more tokens&lt;/strong&gt; than the original, TokenPinch skips the conversion and tells the user to paste the original file directly. This happens with very small files where the format overhead outweighs the savings, or with files that already have short column names and short values.&lt;/p&gt;

&lt;p&gt;Honesty about when the tool doesn't help is part of the value.&lt;/p&gt;




&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;TokenPinch is a single optimize webapp using followin stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SheetJS&lt;/strong&gt; for Excel parsing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;mammoth.js&lt;/strong&gt; for Word document text extraction
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pdf.js&lt;/strong&gt; for PDF text extraction&lt;/li&gt;
&lt;li&gt;Vanilla JS for everything else&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire tool is ~900 lines of HTML/CSS/JS.&lt;/p&gt;




&lt;h2&gt;
  
  
  The &lt;code&gt;.pinch&lt;/code&gt; spec is open
&lt;/h2&gt;

&lt;p&gt;I've published the format specification on GitHub. The spec covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complete file structure and block syntax&lt;/li&gt;
&lt;li&gt;All three compression strategies with examples&lt;/li&gt;
&lt;li&gt;Alias generation algorithm&lt;/li&gt;
&lt;li&gt;Decoder instructions for LLMs&lt;/li&gt;
&lt;li&gt;A reference Python implementation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is for &lt;code&gt;.pinch&lt;/code&gt; to become a standard interchange format for feeding structured data to LLMs — something any tool or pipeline can implement, regardless of language or platform.&lt;/p&gt;

&lt;p&gt;If you're building something that sends files to LLMs and want to reduce token costs, you can implement the format from the spec without using the TokenPinch tool at all.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Developer API&lt;/strong&gt; — to integrate &lt;code&gt;.pinch&lt;/code&gt; compression into pipelines programmatically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More formats&lt;/strong&gt; — YAML, SQL dumps, Parquet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt templates&lt;/strong&gt; — pre-built prompts for common tasks (summarize, find anomalies, convert to JSON)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compression strategic improvements&lt;/strong&gt; — look at the compression algorithms errors reported and improve the current beta version. &lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tool:&lt;/strong&gt; &lt;a href="https://tokenpinch.com" rel="noopener noreferrer"&gt;tokenpinch.com&lt;/a&gt; — free, no signup, runs entirely in your browser&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Spec:&lt;/strong&gt; &lt;a href="https://github.com/javiercast/pinch-format/blob/main/README.md" rel="noopener noreferrer"&gt;github.com/javiercast/pinch-format&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you work with structured data and LLMs regularly, I'd love to hear whether this solves a real problem for you — or what's missing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built over a few sessions with Claude's help. Yes, I used an AI to build a tool for using AI more efficiently. The irony is not lost on me.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>data</category>
      <category>llm</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
