TL;DR: Synoema stores documentation as executable state in the AST. Doc comments (
---) and theirexample:assertions are parsed, tested, and rendered from a single source of truth. Stale docs fail the test suite. 56% fewer tokens than Python equivalent.
Who this is for. If you've ever found an outdated docstring that claimed a function returned a string when it actually returns a list, this article is for you. Whether you maintain a library, write code with AI assistants, or just want documentation that can't lie вАФ read on.
Documentation lies. Not intentionally, but inevitably. A developer changes a function's behavior, forgets to update the docstring, and now the documentation describes code that no longer exists. This drift is not a personal failing вАФ it's a structural problem. Traditional programming languages treat documentation as metadata: a passive comment attached to code, never executed, never verified.
What if the language itself made stale documentation structurally impossible?
In this article вАФ Part 14 of Token Economics of Code вАФ I'll describe a paradigm where documentation is stored as executable state directly in the AST, verified on every test run, and consumed by both humans and LLMs from a single source of truth.
The Scale of the Problem
The disconnect between documentation and code is well-documented (ironically). A 2023 study by Wen et al. found that 25.5% of Python docstrings in popular open-source projects are inconsistent with their corresponding function signatures. One in four.
The cost isn't just confusion. When an LLM reads a stale docstring to understand your codebase, it generates code based on incorrect context. That code fails. The failure triggers a retry. The retry consumes tokens. The tokens cost money and energy. Documentation debt becomes inference debt.
Three dominant approaches exist today. None solve the problem:
| Approach | Language | Verification | Drift risk |
|---|---|---|---|
| Docstrings | Python |
doctest (opt-in, fragile) |
High вАФ separate from tests |
| JSDoc / TSDoc | JS / TS | None вАФ comments only | Very high |
| Haddock | Haskell | None вАФ rendered to HTML | Moderate |
Python's doctest module comes closest, but it has a fundamental limitation: it compares string representations of output, not semantic values. A change in __repr__ breaks every doctest. And doctest extraction relies on regex-level parsing, not the language's own AST.
The Paradigm: Documentation as AST State
Synoema takes a different approach. Documentation is a first-class syntactic element вАФ not a comment convention, but a token type recognized by the lexer, preserved in the AST, and consumed by the compiler toolchain.
The syntax uses triple-dash ---:
--- Compute factorial.
--- example: fact 5 == 120
fact 0 = 1
fact n = n * fact (n - 1)
Three things happen when the parser encounters ---:
- The lexer emits a
Token::DocComment(String)вАФ distinct from--(regular comment, stripped during tokenization) - The parser collects consecutive doc lines and attaches them to the next declaration as
doc: Vec<String>in the AST - Lines starting with
example:are flagged as executable assertions
This is not a wrapper around regular comments. It's a distinct token class, occupying exactly 1 BPE token in the cl100k_base vocabulary. Regular comments (--) are invisible to the AST. Doc comments (---) persist through the entire compilation pipeline.
Here is the key difference from traditional approaches:
Traditional: Source вЖТ [strip comments] вЖТ AST вЖТ Compile
Synoema: Source вЖТ AST (with doc: Vec<String>) вЖТ Compile + Test + Doc
Documentation is not stripped. It travels with the code.
How It Works: From Lexer to Test Runner
Let me trace the full pipeline for a single doctest.
Step 1: Lexing. The scanner encounters --- and calls scan_doc_comment():
Input: "--- example: fact 5 == 120\n"
Output: Token::DocComment("example: fact 5 == 120")
The text after --- is captured verbatim, with leading whitespace trimmed.
Step 2: Parsing. The parser's collect_doc_comments() method gathers consecutive DocComment tokens into a vector. When it hits a function declaration, it attaches the vector:
Decl::Func {
name: "fact",
equations: [...],
doc: ["Compute factorial.", "example: fact 5 == 120"],
span: ...,
}
Both the human-readable description ("Compute factorial") and the executable assertion ("example: fact 5 == 120") live in the same Vec<String>. No separate metadata structure. No JSON sidecar. One field.
Step 3: Test extraction. When you run synoema test, the extract_doctests() function walks every declaration, finds lines starting with example:, and splits them:
"example: fact 5 == 120"
^^^^^^ ^^^
expr expected
The split respects bracket nesting вАФ example: head [1 2 3] == 1 correctly identifies head [1 2 3] as the expression and 1 as the expected value.
Step 4: Execution. Each doctest is evaluated by appending it to the full module source:
-- Original source loaded here --
__doctest_val = fact 5
The result is compared against the expected value (also evaluated in the same context). If they match вАФ pass. If not вАФ fail with a diagnostic showing the expression, expected value, and actual value.
This means doctests have access to every definition in the file. They run in the real evaluation environment, not a sandboxed mock. If the function changes behavior, the doctest catches it.
Three Testing Tiers, One Pipeline
Synoema unifies three kinds of verification into a single synoema test command:
Tier 1: Doctests вАФ inline assertions in doc comments.
--- Reverse a list.
--- example: reverse [1 2 3] == [3 2 1]
reverse [] = []
reverse (x:xs) = reverse xs ++ [x]
Tier 2: Unit tests вАФ named boolean assertions using the test keyword.
test "fact base" = fact 0 == 1
test "fact 10" = fact 10 == 3628800
test "sort then reverse" = reverse (qsort [3 1 2]) == [3 2 1]
Tier 3: Property tests вАФ generative testing with the prop keyword.
test "reverse involution" = prop xs -> reverse (reverse xs) == xs
test "sort idempotent" = prop xs -> qsort (qsort xs) == qsort xs
test "fact positive" = prop n -> fact n >= 1 when n >= 0 && n <= 10
Property tests use Hindley-Milner type inference to determine what values to generate. The variable xs in prop xs -> reverse (reverse xs) == xs is inferred as List a вАФ so the test runner generates random lists. The variable n in prop n -> fact n >= 1 is inferred as Int вАФ so it generates random integers. No manual type annotations required.
The when clause filters generated values: when n >= 0 && n <= 10 discards any n outside that range before evaluating the property. 100 valid trials per property, deterministic seed for reproducibility.
All three tiers run together:
$ synoema test examples/testing.sno
testing.sno
doctests: 4 passed, 0 failed
unit tests: 4 passed, 0 failed
properties: 5 passed, 0 failed (500 trials)
Total: 13 passed, 0 failed
Documentation Generation: Same Source, Different Output
The same doc: Vec<String> that drives testing also drives documentation generation. The synoema doc command reads the AST and renders it вАФ without re-parsing, without a separate doc format, without Markdown source files.
Markdown output (synoema doc --format md):
Interleaves doc lines as prose and declarations as code blocks. Lines starting with example: are rendered as highlighted code snippets. Metadata lines (guide:, order:, requires:) control page title, ordering, and dependency tracking вАФ but are invisible in the rendered output.
JSON output (synoema doc --format json):
Exports structured metadata for tooling:
{
"file": "examples/testing.sno",
"functions": [
{
"name": "fact",
"doc": ["Compute factorial.", "example: fact 5 == 120"],
"line": 7
}
]
}
This JSON is consumed by the MCP server, which exposes Synoema documentation to LLM agents. The documentation that LLMs read is the same documentation that tests verify. There is no gap.
Why LLMs Care
When an LLM generates or modifies Synoema code, it reads doc comments as part of the source context. Those comments are guaranteed to be accurate вАФ because if they weren't, synoema test would have failed.
This creates a feedback loop:
LLM reads doc вЖТ generates code вЖТ code changes behavior вЖТ
synoema test catches stale docs вЖТ developer updates docs вЖТ
LLM reads updated docs вЖТ ...
In traditional languages, the loop has a silent gap: nothing catches stale docs. The LLM operates on incorrect context, generates incorrect code, and the developer blames the LLM rather than the documentation.
There's a second benefit specific to token economics. Doc comments in Synoema use --- (1 BPE token) instead of Python's """...""" (at least 2 tokens for delimiters) or JSDoc's /** ... */ (3+ tokens). Each example: line is 1 token for the keyword. The documentation syntax itself is token-efficient вАФ consistent with the language's design principle of minimizing BPE token count.
Side-by-Side: Python vs Synoema
Let's compare equivalent documented, tested code:
Python (32 tokens):
def fact(n):
"""Compute factorial.
>>> fact(5)
120
"""
if n == 0:
return 1
return n * fact(n - 1)
Synoema (14 tokens):
--- Compute factorial.
--- example: fact 5 == 120
fact 0 = 1
fact n = n * fact (n - 1)
Same function. Same documentation. Same executable test. 56% fewer tokens. But the meaningful difference isn't token count вАФ it's that Python's doctest compares string output ("120") while Synoema's compares evaluated values (120 == 120). Change fact to return a float, and Python's doctest breaks on "120.0". Synoema's doesn't.
Try It
Install and run doctests on the example suite:
# Install Synoema
cargo run -p synoema-repl -- install
# Run all tests (doctests + unit + property)
synoema test examples/
# Run tests for a specific file
synoema test examples/testing.sno
# Generate documentation
synoema doc examples/testing.sno
synoema doc --format json examples/testing.sno
Write your own documented function:
--- Double every element in a list.
--- example: double_all [1 2 3] == [2 4 6]
double_all xs = [x * 2 | x <- xs]
Save it as my_funcs.sno and run synoema test my_funcs.sno. The example assertion becomes a test. The description becomes documentation. One source, two outputs, zero drift.
What's Next
In the next article, we'll explore the future of code generation вАФ how compilation, type inference, and executable documentation combine into an agentic pipeline where LLMs don't just write code, but verify it.
Part 14 of "Token Economics of Code" by @andbubnov. Synoema is open-source: github.com/Delimitter/synoema.
Glossary
| Term | Explanation |
|---|---|
| AST | Abstract Syntax Tree вАФ the parsed structure of source code |
| BPE | Byte Pair Encoding вАФ how LLMs split text into tokens |
| Doctest | An executable example embedded in documentation |
| Doc comment | A --- line in Synoema that persists in the AST |
| Hindley-Milner | Type inference algorithm вАФ determines types without annotations |
| MCP | Model Context Protocol вАФ connects LLM agents to external tools |
| Property test | A test that verifies a property holds for random inputs |
| Token | Smallest text unit for an LLM, roughly 3-4 characters |
Top comments (0)