This started as a rage-quit from config files.
I was hacking on a hobby project — a SOCKS5 proxy rotator — and every time I needed to tweak launch configs, I'd lose time fighting the format itself instead of the actual problem. After years of being mildly annoyed at every config format I tried, I finally snapped and built my own.
It's called Ktav (כְּתָב, Hebrew for "script/writing"). This article is the long version of what it is, why every existing format made me give up, and the design decisions that went into the parser, the FFI strategy across seven languages, and the editor tooling. It's open-source, dual-licensed MIT OR Apache-2.0, and the playground is in the browser if you'd rather just try it.
The graveyard of formats I tried
Before writing one more line of code, let me explain why none of the existing options worked.
.env
Fine for flat key=value, useless the moment you need a nested object or an array. I ended up writing things like:
UPSTREAM_0_HOST=a.example
UPSTREAM_0_PORT=1080
UPSTREAM_0_WEIGHT=0.7
UPSTREAM_1_HOST=b.example
UPSTREAM_1_PORT=1080
UPSTREAM_1_WEIGHT=0.3
It's not even configuration — it's a hand-rolled index encoding. Adding a third upstream means inventing a new convention every time.
INI / TOML
Sections help, but TOML's [[array.of.tables]] syntax for arrays of tables always made me pause and re-read the docs. Inline tables can't span multiple lines. Every time I thought "this should be simple" it wasn't.
[[upstreams]]
host = "a.example"
port = 1080
weight = 0.7
[[upstreams]]
host = "b.example"
port = 1080
weight = 0.3
TOML is great for Cargo.toml-style configs where you know the schema and the structure is mostly flat. For hand-edited configs with arbitrary nesting, it always felt like I was working around the syntax instead of with it.
JSON
JSON's data model is exactly what I want. Scalars, arrays, objects, null, booleans. Composable. Unambiguous. But typing it by hand?
{
"port": 20082,
"log_level": "info",
"upstreams": [
{
"host": "a.example",
"port": 1080,
"weight": 0.7
},
{
"host": "b.example",
"port": 1080,
"weight": 0.3
}
]
}
Quotes around every key. Quotes around every string. Commas after every line. Trailing comma equals parse error. Forget a comma anywhere and the error message is on the wrong line. I was spending more time on punctuation than on config values.
JSON5
Better — relaxed quoting, trailing commas allowed, comments. But strings with special characters still need quoting, and in most tooling commas are still mandatory between items. It softens the pain without removing it.
YAML
I genuinely tried YAML. Multiple times. But I would constantly lose track of where I was in the indentation. A misaligned space silently changes the structure. I'd paste a block, the indent shifts, and suddenly my array is a string. I don't have the spatial reasoning for YAML, apparently — and apparently neither does half the industry, given how often I see YAML horror stories.
The Norway problem (country: NO parsed as false), the sexagesimal floats, the implicit type system that makes 123 an int but 123.0.0 a string — YAML's flexibility is its own footgun.
So what is Ktav?
Take JSON's data model. Strip the ceremony.
## A SOCKS5 rotator config.
port: 20082
log_level: info
debug: true
upstreams: [
{
host: a.example
port: 1080
weight: 0.7
}
{
host: b.example
port: 1080
weight: 0.3
}
]
## '::' forces a literal string — "true" stays a String, not a Bool.
feature_flag:: true
zip_code:: 00544
## Multiline strings — leading indent is auto-trimmed.
motd: (
Welcome to the node.
Please behave.
)
That's it. No quotes around keys. No quotes around strings. No commas between items. No indentation-sensitivity (the indent above is cosmetic; the parser ignores it). Bare numbers auto-type as Integer or Float; true/false/null are keywords; everything else is a String, verbatim.
The data model is exactly JSON's. Anything you can express in JSON, you can express in Ktav. The reverse is also true — you can roundtrip JSON ⇄ Ktav cleanly.
Design decisions worth explaining
Every absent feature is a decision. Here's why the format looks the way it does.
1. Typing by lexical form, not by quoting
A bare number that looks like an integer (20082) becomes Integer. A bare number that looks like a float (0.7) becomes Float. Everything else — including digit-ish content like a version 1.2.3-rc1, a regex \d+\.example, or info — is a String, verbatim.
Trade-off: simple to reason about (no need for type hints), but it introduces the need for an explicit "I want this as a string" marker for ambiguous cases — see (3).
2. ## for comments instead of #
Single # is too common as content — hex colors, issue references, channel names, shebangs. Requiring ## for comments means color: #ff5577 parses without escaping.
Trade-off: two characters instead of one, but zero ambiguity between content and comment.
3. :: as a "forced literal string" marker
When the lexical-form typing would mis-classify, :: says "the entire value, as-is, is a String":
feature_flag:: true ## "true" — String, not Bool
zip_code:: 00544 ## "00544" — String, preserves leading zero
version:: 1.2.3 ## "1.2.3" — String, not Float
Trade-off: it's a second sigil to learn, but it's a clean escape hatch without re-introducing JSON-style quoting for every value.
4. Multi-line strings via ( ... ) with auto-dedent
YAML's | and > block scalars are powerful but I find them under-discoverable. Ktav uses parentheses, and the common leading indent of the block is auto-trimmed so you can indent the body of the string to match its surroundings:
motd: (
Welcome to the node.
Please behave.
)
The value is "Welcome to the node.\nPlease behave." — the four-space indent is recognized as cosmetic and stripped.
Trade-off: less expressive than YAML's full block-scalar grammar (no |-, |+, fold/strip variants), but covers the 90% case with one rule.
5. Dotted keys
a.b.c: value is sugar for {a: {b: {c: value}}}. Optional — you can always use explicit {} if you prefer:
node.host: a.example
node.port: 1080
node.auth.user: alice
It's the only place where the format adds a convenience that isn't strictly necessary. I went back and forth on whether to include it.
What's deliberately not in the format
- No anchors / references (YAML's
&and*) - No type tags
- No expressions, interpolation, or includes
- No schema language
- No "JSON super-set" claim — Ktav is its own format with its own parser
Every absent feature is one I considered and rejected. Most of them push the parser past the "one evening to implement" complexity I wanted preserved, which matters for the next part.
It's Rust all the way down
The reference parser is in Rust. Hand-written recursive descent, no parser generator, zero-copy where possible. Speed is comparable to serde_json on typical config-sized inputs.
use serde::Deserialize;
#[derive(Deserialize)]
struct Config {
port: u16,
log_level: String,
upstreams: Vec<Upstream>,
}
#[derive(Deserialize)]
struct Upstream {
host: String,
port: u16,
weight: f64,
}
let config: Config = ktav::from_str(&text)?;
Serde support is native — no separate serializer crate, no glue code.
The FFI strategy across seven languages
This is the part I'm most curious to hear feedback on.
Bindings for JavaScript / TypeScript, Python, Go, PHP, Java, and C# all wrap the same Rust core via FFI. One parser implementation, one behavior, seven languages. Each binding ships prebuilt binaries for Linux/macOS/Windows so consumers don't have to compile anything.
| Language | FFI mechanism | Distribution |
|---|---|---|
| JS / TS | N-API (native) + WebAssembly (fallback) | npm install @ktav-lang/ktav |
| Python | PyO3 + abi3 wheels |
pip install ktav |
| Go |
purego (no cgo for consumers) |
go get github.com/ktav-lang/golang |
| PHP | FFI (PHP 7.4+ ext-ffi) |
composer require ktav-lang/ktav |
| Java | JNA (no JNI for consumers) | Maven Central: io.github.ktav-lang:ktav
|
| C# / .NET | P/Invoke | dotnet add package Ktav |
The hard parts were:
-
Designing a stable C ABI that doesn't leak Rust types. Strings cross the boundary as length-prefixed byte slices; everything else is opaque handles with explicit
_freefunctions. - Memory ownership semantics — every binding had to learn the same rule: "the parser allocates, the language frees via the free function". Documented once, repeated everywhere.
-
Error propagation — Rust's
Resultbecomes a tagged union at the FFI layer, then each language wraps it in its idiomatic equivalent (exceptions in Python/Java/C#, errors in Go, rejected promises in JS). - Conformance tests — ~180 tests that every binding runs against. If a binding diverges from the spec, CI catches it. This was the single most valuable investment in the project.
The WebAssembly build of the same Rust crate also powers the online playground — you can paste JSON, YAML, TOML, or INI and see the Ktav equivalent in your browser. Everything runs locally; nothing is sent to a server.
Editor tooling
Because a config format without editor support is just a frustration generator.
-
LSP server in Rust (separate
ktav-lspcrate) — diagnostics, completions, hover info, go-to-definition for dotted keys. - VS Code plugin — bundles the LSP, syntax highlighting via TextMate grammar.
- JetBrains plugin (IntelliJ, CLion, RustRover, WebStorm, PyCharm, GoLand, PhpStorm, Rider) — bundles the LSP, syntax highlighting, indentation support.
- tree-sitter grammar — drop into Neovim, Helix, Zed, or anything else with tree-sitter support.
The LSP catches the things the parser would catch at runtime, but inline as you type: type mismatches at the lexical level, unmatched brackets, malformed multi-line strings.
Honest caveats
I should be upfront about what this is and isn't.
- It's young. Spec is at 0.6.x, format is still evolving though I don't expect breaking changes from here.
- No production users I know of besides me. I built it because I wanted it. The ecosystem grew because wrapping one Rust core in FFI turned out to be much more tractable than I expected.
- It doesn't replace TOML for Cargo. It doesn't replace YAML for Kubernetes. Those formats have entire ecosystems built on them and Ktav has zero. I'm not trying to displace anything — I'm offering a different trade-off for people who, like me, want JSON's flexibility without JSON's ceremony.
-
Parsing speed is comparable to
serde_jsonfor typical configs (< 100 KB), but I'm not claiming it beatssimd-jsonon 500 MB inputs. It's a config format, not a data interchange format.
What exists today
All open-source, dual-licensed MIT OR Apache-2.0.
-
Specification:
ktav-lang/spec— formal grammar, semantics, ~180 conformance tests -
Rust reference:
ktavon crates.io, with serde support -
JS / TS:
@ktav-lang/ktavon npm -
Python:
ktavon PyPI -
Go:
github.com/ktav-lang/golang -
PHP:
ktav-lang/ktavon Packagist -
Java:
io.github.ktav-lang:ktavon Maven Central -
C# / .NET:
Ktavon NuGet -
LSP server:
ktav-lspon crates.io -
Editor plugins:
ktav-lang/editor— VS Code + JetBrains -
Tree-sitter:
ktav-lang/tree-sitter-ktav - Online playground: ktav-lang.github.io
Everything lives under the ktav-lang organization on GitHub.
What I'd love feedback on
Is
::the right shape for "forced literal string"? Alternatives I considered::'(looks like a quote),:"(same),:s(explicit type),:=(looks like assignment in some languages). I picked::because it feels like "the same marker, doubled" — but I'm not sure.Should multi-line strings preserve trailing whitespace? Right now they're trimmed. I went back and forth.
Are dotted keys worth their complexity? They're nice when present but they're the only "two ways to do the same thing" feature in the format.
The FFI-everywhere strategy — am I underestimating maintenance cost? Right now ~180 conformance tests catch divergences in CI. Is there a scale at which this breaks down?
If you read this far, thank you. Issues and PRs are welcome anywhere in the org, and so is any feedback that helps me understand what trade-offs to make next.
(Drafted with assistance from an LLM editor)
Top comments (0)