DEV Community

Arch-AI-tech
Arch-AI-tech

Posted on

Ktav: I got fed up with every config format, so I built one with no quotes, no commas, no indentation hell

This started as a rage-quit from config files.

I was hacking on a hobby project — a SOCKS5 proxy rotator — and every time I needed to tweak launch configs, I'd lose time fighting the format itself instead of the actual problem. After years of being mildly annoyed at every config format I tried, I finally snapped and built my own.

It's called Ktav (כְּתָב, Hebrew for "script/writing"). This article is the long version of what it is, why every existing format made me give up, and the design decisions that went into the parser, the FFI strategy across seven languages, and the editor tooling. It's open-source, dual-licensed MIT OR Apache-2.0, and the playground is in the browser if you'd rather just try it.

The graveyard of formats I tried

Before writing one more line of code, let me explain why none of the existing options worked.

.env

Fine for flat key=value, useless the moment you need a nested object or an array. I ended up writing things like:

UPSTREAM_0_HOST=a.example
UPSTREAM_0_PORT=1080
UPSTREAM_0_WEIGHT=0.7
UPSTREAM_1_HOST=b.example
UPSTREAM_1_PORT=1080
UPSTREAM_1_WEIGHT=0.3
Enter fullscreen mode Exit fullscreen mode

It's not even configuration — it's a hand-rolled index encoding. Adding a third upstream means inventing a new convention every time.

INI / TOML

Sections help, but TOML's [[array.of.tables]] syntax for arrays of tables always made me pause and re-read the docs. Inline tables can't span multiple lines. Every time I thought "this should be simple" it wasn't.

[[upstreams]]
host = "a.example"
port = 1080
weight = 0.7

[[upstreams]]
host = "b.example"
port = 1080
weight = 0.3
Enter fullscreen mode Exit fullscreen mode

TOML is great for Cargo.toml-style configs where you know the schema and the structure is mostly flat. For hand-edited configs with arbitrary nesting, it always felt like I was working around the syntax instead of with it.

JSON

JSON's data model is exactly what I want. Scalars, arrays, objects, null, booleans. Composable. Unambiguous. But typing it by hand?

{
    "port": 20082,
    "log_level": "info",
    "upstreams": [
        {
            "host": "a.example",
            "port": 1080,
            "weight": 0.7
        },
        {
            "host": "b.example",
            "port": 1080,
            "weight": 0.3
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode

Quotes around every key. Quotes around every string. Commas after every line. Trailing comma equals parse error. Forget a comma anywhere and the error message is on the wrong line. I was spending more time on punctuation than on config values.

JSON5

Better — relaxed quoting, trailing commas allowed, comments. But strings with special characters still need quoting, and in most tooling commas are still mandatory between items. It softens the pain without removing it.

YAML

I genuinely tried YAML. Multiple times. But I would constantly lose track of where I was in the indentation. A misaligned space silently changes the structure. I'd paste a block, the indent shifts, and suddenly my array is a string. I don't have the spatial reasoning for YAML, apparently — and apparently neither does half the industry, given how often I see YAML horror stories.

The Norway problem (country: NO parsed as false), the sexagesimal floats, the implicit type system that makes 123 an int but 123.0.0 a string — YAML's flexibility is its own footgun.

So what is Ktav?

Take JSON's data model. Strip the ceremony.

## A SOCKS5 rotator config.
port: 20082
log_level: info
debug: true

upstreams: [
    {
        host: a.example
        port: 1080
        weight: 0.7
    }
    {
        host: b.example
        port: 1080
        weight: 0.3
    }
]

## '::' forces a literal string — "true" stays a String, not a Bool.
feature_flag:: true
zip_code:: 00544

## Multiline strings — leading indent is auto-trimmed.
motd: (
    Welcome to the node.
    Please behave.
)
Enter fullscreen mode Exit fullscreen mode

That's it. No quotes around keys. No quotes around strings. No commas between items. No indentation-sensitivity (the indent above is cosmetic; the parser ignores it). Bare numbers auto-type as Integer or Float; true/false/null are keywords; everything else is a String, verbatim.

The data model is exactly JSON's. Anything you can express in JSON, you can express in Ktav. The reverse is also true — you can roundtrip JSON ⇄ Ktav cleanly.

Design decisions worth explaining

Every absent feature is a decision. Here's why the format looks the way it does.

1. Typing by lexical form, not by quoting

A bare number that looks like an integer (20082) becomes Integer. A bare number that looks like a float (0.7) becomes Float. Everything else — including digit-ish content like a version 1.2.3-rc1, a regex \d+\.example, or info — is a String, verbatim.

Trade-off: simple to reason about (no need for type hints), but it introduces the need for an explicit "I want this as a string" marker for ambiguous cases — see (3).

2. ## for comments instead of #

Single # is too common as content — hex colors, issue references, channel names, shebangs. Requiring ## for comments means color: #ff5577 parses without escaping.

Trade-off: two characters instead of one, but zero ambiguity between content and comment.

3. :: as a "forced literal string" marker

When the lexical-form typing would mis-classify, :: says "the entire value, as-is, is a String":

feature_flag:: true       ## "true" — String, not Bool
zip_code:: 00544          ## "00544" — String, preserves leading zero
version:: 1.2.3           ## "1.2.3" — String, not Float
Enter fullscreen mode Exit fullscreen mode

Trade-off: it's a second sigil to learn, but it's a clean escape hatch without re-introducing JSON-style quoting for every value.

4. Multi-line strings via ( ... ) with auto-dedent

YAML's | and > block scalars are powerful but I find them under-discoverable. Ktav uses parentheses, and the common leading indent of the block is auto-trimmed so you can indent the body of the string to match its surroundings:

motd: (
    Welcome to the node.
    Please behave.
)
Enter fullscreen mode Exit fullscreen mode

The value is "Welcome to the node.\nPlease behave." — the four-space indent is recognized as cosmetic and stripped.

Trade-off: less expressive than YAML's full block-scalar grammar (no |-, |+, fold/strip variants), but covers the 90% case with one rule.

5. Dotted keys

a.b.c: value is sugar for {a: {b: {c: value}}}. Optional — you can always use explicit {} if you prefer:

node.host: a.example
node.port: 1080
node.auth.user: alice
Enter fullscreen mode Exit fullscreen mode

It's the only place where the format adds a convenience that isn't strictly necessary. I went back and forth on whether to include it.

What's deliberately not in the format

  • No anchors / references (YAML's & and *)
  • No type tags
  • No expressions, interpolation, or includes
  • No schema language
  • No "JSON super-set" claim — Ktav is its own format with its own parser

Every absent feature is one I considered and rejected. Most of them push the parser past the "one evening to implement" complexity I wanted preserved, which matters for the next part.

It's Rust all the way down

The reference parser is in Rust. Hand-written recursive descent, no parser generator, zero-copy where possible. Speed is comparable to serde_json on typical config-sized inputs.

use serde::Deserialize;

#[derive(Deserialize)]
struct Config {
    port: u16,
    log_level: String,
    upstreams: Vec<Upstream>,
}

#[derive(Deserialize)]
struct Upstream {
    host: String,
    port: u16,
    weight: f64,
}

let config: Config = ktav::from_str(&text)?;
Enter fullscreen mode Exit fullscreen mode

Serde support is native — no separate serializer crate, no glue code.

The FFI strategy across seven languages

This is the part I'm most curious to hear feedback on.

Bindings for JavaScript / TypeScript, Python, Go, PHP, Java, and C# all wrap the same Rust core via FFI. One parser implementation, one behavior, seven languages. Each binding ships prebuilt binaries for Linux/macOS/Windows so consumers don't have to compile anything.

Language FFI mechanism Distribution
JS / TS N-API (native) + WebAssembly (fallback) npm install @ktav-lang/ktav
Python PyO3 + abi3 wheels pip install ktav
Go purego (no cgo for consumers) go get github.com/ktav-lang/golang
PHP FFI (PHP 7.4+ ext-ffi) composer require ktav-lang/ktav
Java JNA (no JNI for consumers) Maven Central: io.github.ktav-lang:ktav
C# / .NET P/Invoke dotnet add package Ktav

The hard parts were:

  • Designing a stable C ABI that doesn't leak Rust types. Strings cross the boundary as length-prefixed byte slices; everything else is opaque handles with explicit _free functions.
  • Memory ownership semantics — every binding had to learn the same rule: "the parser allocates, the language frees via the free function". Documented once, repeated everywhere.
  • Error propagation — Rust's Result becomes a tagged union at the FFI layer, then each language wraps it in its idiomatic equivalent (exceptions in Python/Java/C#, errors in Go, rejected promises in JS).
  • Conformance tests — ~180 tests that every binding runs against. If a binding diverges from the spec, CI catches it. This was the single most valuable investment in the project.

The WebAssembly build of the same Rust crate also powers the online playground — you can paste JSON, YAML, TOML, or INI and see the Ktav equivalent in your browser. Everything runs locally; nothing is sent to a server.

Editor tooling

Because a config format without editor support is just a frustration generator.

  • LSP server in Rust (separate ktav-lsp crate) — diagnostics, completions, hover info, go-to-definition for dotted keys.
  • VS Code plugin — bundles the LSP, syntax highlighting via TextMate grammar.
  • JetBrains plugin (IntelliJ, CLion, RustRover, WebStorm, PyCharm, GoLand, PhpStorm, Rider) — bundles the LSP, syntax highlighting, indentation support.
  • tree-sitter grammar — drop into Neovim, Helix, Zed, or anything else with tree-sitter support.

The LSP catches the things the parser would catch at runtime, but inline as you type: type mismatches at the lexical level, unmatched brackets, malformed multi-line strings.

Honest caveats

I should be upfront about what this is and isn't.

  • It's young. Spec is at 0.6.x, format is still evolving though I don't expect breaking changes from here.
  • No production users I know of besides me. I built it because I wanted it. The ecosystem grew because wrapping one Rust core in FFI turned out to be much more tractable than I expected.
  • It doesn't replace TOML for Cargo. It doesn't replace YAML for Kubernetes. Those formats have entire ecosystems built on them and Ktav has zero. I'm not trying to displace anything — I'm offering a different trade-off for people who, like me, want JSON's flexibility without JSON's ceremony.
  • Parsing speed is comparable to serde_json for typical configs (< 100 KB), but I'm not claiming it beats simd-json on 500 MB inputs. It's a config format, not a data interchange format.

What exists today

All open-source, dual-licensed MIT OR Apache-2.0.

Everything lives under the ktav-lang organization on GitHub.

What I'd love feedback on

  1. Is :: the right shape for "forced literal string"? Alternatives I considered: :' (looks like a quote), :" (same), :s (explicit type), := (looks like assignment in some languages). I picked :: because it feels like "the same marker, doubled" — but I'm not sure.

  2. Should multi-line strings preserve trailing whitespace? Right now they're trimmed. I went back and forth.

  3. Are dotted keys worth their complexity? They're nice when present but they're the only "two ways to do the same thing" feature in the format.

  4. The FFI-everywhere strategy — am I underestimating maintenance cost? Right now ~180 conformance tests catch divergences in CI. Is there a scale at which this breaks down?

If you read this far, thank you. Issues and PRs are welcome anywhere in the org, and so is any feedback that helps me understand what trade-offs to make next.

(Drafted with assistance from an LLM editor)

Top comments (0)