Cx Dev Log — 2026-03-30

#cx #programming

Two spec-level decisions have landed on submain today. The UTF-8 string encoding model for Cx is locked, and the semicolon rule has been defined with more precision. Neither commit introduces a new feature, but they remove blockers by clarifying design decisions. The current progress holds at 78/78 in the matrix.

UTF-8, locked everywhere

The debate over string encoding is concluded: Cx is strictly UTF-8. Here’s what that means:

All source files must be in UTF-8.
str values must be valid UTF-8 at runtime, or they trigger an error.
char refers to a Unicode scalar value.
Binary data? That's for byte buffers, not str.

This decision is now documented in Section 25 of docs/cx_syntax.md and checked off as a completed hard blocker on the roadmap. No changes to runtime or compiler code here—this is a purely spec decision. Yet, it's significant because it paves the way for critical stdlib and filesystem work. Without a locked string encoding model, defining file I/O or string manipulation functions was impossible.

The decision deliberately mirrors Rust's approach—no room for Latin-1 fallback, no WTF-8, and no byte strings pretending to be str. For a systems language that aims for predictable string behavior, this choice makes sense. The design space is now firmly closed on this front.

Semicolon rule refinement

The semicolon policy got stricter yet remains unresolved. A single parser edit at src/frontend/parser.rs:377 now permits declaration semicolons to be optional using .then_ignore(semi.clone().or_not()), but expression statements still need them.

The logic isn't complicated: since the parser lacks newline awareness, it sees x + 1 in an expression statement context as ambiguous. Is it a complete statement or part of something larger? To avoid ambiguity, semicolons are required for expressions—without rewriting the parser.

The roadmap now includes an updated Known Gaps entry that shifts from a vague note on inconsistent behavior to a specific outline of the workable and non-workable bits. The goals for fully optional semicolon syntax are officially postponed beyond version 0.1, pending a newline-aware parser redesign.

This transparency about design and implementation shortfalls doesn't mask the situation. The commit essentially lays bare where the project stands and why it isn't quite there yet.

Spec-tightening phase

Today's contributions lean heavily on documentation rather than code. The UTF-8 commit involves no code changes, and the semicolon tweak is a single line paired with documentation updates. This indicates that the project is in a spec-tightening phase—locking choices, documenting limits, and finalizing design questions instead of cranking out new features.

It's a sensible focus. The frontend matrix remains stable at 78/78. Backend operations, like scalar layout, have previously been solidified through ABI work. Upcoming challenges like the test runner and error model benefit from a settled spec foundation.

The submain gap

These updates are on submain, which is now four commits ahead of main. Other recent commits address arithmetic wrapping and while loop lowering initiated earlier. None of these have merged into the main branch yet. The roadmap has diverged (version 4.7 on main versus 4.6 on submain), hinting at merge conflicts ahead in docs/frontend/ROADMAP.md.

This marks the fourth day in a row where a predicted submain-to-main merge hasn't happened. While it's not blocking anything critical now, the unresolved state only risks complicating future merges.

What is next

Clearly, the merge is the immediate priority. Beyond that, ongoing hard blockers like the test runner and Result error model have lingered without progress, tailing a week of predicted advancement without payoff. On the backend, if/else lowering and struct layout present logical targets for upcoming work. We'll soon see whether the project remains in spec-tightening mode or shifts back into full implementation.

Follow the Cx language project: