DEV Community

Discussion on: TOON vs JSON: The New Format Designed for AI

Collapse
 
xwero profile image
david duymelinck

Why not use YAML if you want a more condensed format?

Collapse
 
marcu_loreto_3cabce9877ee profile image
Marcu Loreto

I got that TOON is one more compact data serialization format, similar in purpose to YAML

Collapse
 
xwero profile image
david duymelinck

True but why the need to invent a new format?
Most languages have mature YAML and CSV libraries if you need to condense the text that is send to an AI.

The main reason for TOON is probably that you can feed it content that is better compacted by YAML and better compacted by CSV. Instead of creating a function to switch the output between the two formats yourself.
Out of curiosity I asked an AI to create that function.

/**
 * Detects whether a JSON object represents a tree‑like structure
 * or a tabular (array‑of‑objects) structure, then converts it to
 * YAML or CSV accordingly.
 *
 * @param {object|array} data - Parsed JSON data.
 * @returns {string} YAML string for tree‑like data or CSV string for tabular data.
 */
function convertJson(data) {
  // Helper: check if value is a plain object (not array, not null)
  const isObject = v => v && typeof v === 'object' && !Array.isArray(v);

  // Detect tabular: an array where every element is an object
  // and each object has the same set of primitive keys.
  const isTabular = arr => {
    if (!Array.isArray(arr) || arr.length === 0) return false;
    // All elements must be objects (no nested arrays/objects as values)
    const firstKeys = Object.keys(arr[0]);
    if (firstKeys.length === 0) return false;

    return arr.every(item => {
      if (!isObject(item)) return false;
      const keys = Object.keys(item);
      // same keys as first row
      if (keys.length !== firstKeys.length) return false;
      for (let k of firstKeys) {
        if (!keys.includes(k)) return false;
        // values should be primitive (string, number, boolean, null)
        const v = item[k];
        if (v && typeof v === 'object') return false;
      }
      return true;
    });
  };

  // Convert tabular data to CSV
  const toCsv = arr => {
    const headers = Object.keys(arr[0]);
    const escape = v => {
      if (v == null) return '';
      const s = String(v);
      return s.includes(',') || s.includes('"') || s.includes('\n')
        ? `"${s.replace(/"/g, '""')}"`
        : s;
    };
    const rows = arr.map(row => headers.map(h => escape(row[h])).join(','));
    return [headers.join(','), ...rows].join('\n');
  };

  // Convert any object/array to YAML (simple implementation)
  const toYaml = obj => {
    const yaml = require('js-yaml'); // assumes js-yaml is available
    return yaml.dump(obj, { noRefs: true, indent: 2 });
  };

  // Main logic
  if (Array.isArray(data) && isTabular(data)) {
    return toCsv(data);
  } else {
    // For tree‑like structures we fall back to YAML
    // If js-yaml is not available, a minimal serializer could be used.
    try {
      return toYaml(data);
    } catch (e) {
      // Minimal fallback YAML serializer
      const serialize = (value, indent = 0) => {
        const pad = ' '.repeat(indent);
        if (Array.isArray(value)) {
          return value.map(v => `${pad}- ${serialize(v, indent + 2).trimStart()}`).join('\n');
        } else if (isObject(value)) {
          return Object.entries(value)
            .map(([k, v]) => `${pad}${k}: ${serialize(v, indent + 2).trimStart()}`)
            .join('\n');
        } else {
          return `${value}`;
        }
      };
      return serialize(data);
    }
  }
}
Enter fullscreen mode Exit fullscreen mode
Collapse
 
ievolved profile image
Shawn Bullock

Our system processes billions of tokens each month from the database alone, we can't use YAML or JSON because they are too token heavy. For flat results (like a database query) we simply return it as CSV (which is even 30% fewer tokens than TOON). We haven't adopted TOON yet (we have our own for structured objects) but its definitely more token friendly than YAML. At the scale we operate, YAML is expensive.

Collapse
 
xwero profile image
david duymelinck

I based my statement on the examples I have seen at that moment. And I agree if you use YAML for tabular data it is expensive. That is why in my other comment I mentioned a switch based on the shape of the data. CSV for tabular data and YAML for hierarchical data.
You can even have CSV in YAML.

people: |
  id,name,age
  1,Alice,30
  2,Bob,25
Enter fullscreen mode Exit fullscreen mode

If that wasn't possible, I would go all in for TOON for those mixed cases.

TOON is YAML with hierarchical data. So it doesn't reduces tokens. And as you mention, in the case of tabular data CSV is better.

If you can show where TOON is saving tokens over the smart use of YAML and CSV, I'm glad to stand corrected.