Mat Weiss

Posted on Apr 17

Code Intelligence Is Being Retrofitted. Ruuk Builds It In.

#architecture #ai #software #programming

A response to Thoughtworks Technology Radar Vol. 34, Blip 18: Code Intelligence as Agentic Tooling

Blip 18 of the April 2026 Thoughtworks Radar names a real problem: AI coding agents are effectively blind to the meaning of the code they operate on. The Radar's answer is richer tooling — LSP integrations, OpenRewrite's Lossless Semantic Tree, JetBrains MCP servers.

That's the practical path for mainstream languages. But it's still using the AST to infer intent. What would it look like to use the AST to see it?

Ruuk — a language I'm designing — takes a different position: the constraints worth enforcing should be in the language itself.

What the AST Cannot Tell You

Take a typical enterprise operation: approving an order. In Java, an agent with full LSP access sees something like this:

public ApprovalResult approveOrder(OrderId orderId, CSR csr) {
    Order order = orderStore.findById(orderId);
    // ... validation logic
    // ... state transition
    // ... outcome handling
}

The agent knows the function name, its signature, its call sites, and the types it touches. What it doesn't know is more consequential:

order must be in Created state for this call to be valid. Calling it on a Cancelled order is a logic error, not a type error — the compiler won't catch it, and neither will the agent.
There are exactly two outcomes — Approved and Rejected — and callers must handle both. The agent has to read the implementation to find out.
This is an instantaneous state transition, not a long-running process. That distinction matters for error handling and compensation strategy. The function signature doesn't say.
orderStore is being mutated — it's an entity undergoing state change, not a read-only data source. The agent can't tell that from the call.

Modern Java closes some of these gaps. Sealed classes and pattern matching (Java 21+) give you exhaustive outcome handling. Some languages go further: TypeScript's discriminated unions and Rust's typestate patterns encode more intent into the type system. But even in those languages, the precondition that the order must be in Created state remains a convention, not a compiler-checked constraint. The resource mutation role is still invisible. The state machine is still implicit. They moved the needle on what can happen; they didn't touch what must be true before or what kind of change is occurring.

An agent can reconstruct those properties for any well-written function by reading its implementation and tests. The problem is scale. Across a codebase of thousands of operations, inference compounds uncertainty. It fails hardest on the code that needs agents most: legacy systems, inconsistent patterns, missing tests. Structural declarations don't degrade. The thousandth operation is as machine-readable as the first.

What Ruuk Exposes at Declaration Time

In Ruuk, the same operation looks like this:

resource Order = Created -> Approved -> Shipped -> Delivered

op approveOrder (subject: Order<Created>) (by: String) (goal: OrderStore) =
    performs Order.Created -> Order.Approved
    outcomes
        | Approved of Order<Approved>
        | Rejected of String

Everything the Java version hid is in the declaration. The precondition is in the type: Order<Created>. Passing an Order<Cancelled> is a compile error. The state transition is explicit in the performs clause. The outcomes are enumerated, and the compiler verifies that every call site handles both. The subject role marks the order as the entity being changed; the goal role marks the store as the mutation target, distinct from a read-only data source.

The compiler checks these declarations against the implementation. If the function body transitions to a state that doesn't match the performs clause, it won't compile. If you add an outcome variant without updating call sites, they won't compile. Declarations can't drift from reality because they're verified at compile time, then erased. Zero runtime cost.

That's a meaningful difference from design-by-contract predecessors like Eiffel and JML, where contracts were primarily runtime-checked or relied on external verification tools. In Ruuk, the declaration is the constraint, and the compiler enforces it statically.

I designed it this way because the information always existed. Every team I've worked with on enterprise applications knew their preconditions, their state machines, their failure modes. They just had no place to put that knowledge where the compiler could use it.

From this declaration alone, an agent can answer — without reading one line of implementation — every question the Java version left open. The Radar points to the AST and the richer Lossless Semantic Tree as the right representations for agent tooling. Those capture structure: how code is organized, what the syntactic relationships are. Ruuk's declarations go further — they capture meaning: what must be true, what can happen, and what kind of change is occurring.

What Could an Agent Do With This?

Impact analysis. When Order gains a new field, an agent can ask: which projections include this field? A Ruuk projection is a typed, compiler-checked subset of a resource's fields. CustomerOrderView = Order only { id; customer; total } declares exactly which fields downstream consumers can see. Finding every projection affected by a schema change is a structural query, not a grep through the codebase.

Outcome coverage. Which operations have outcomes stubbed with todo? In Ruuk, todo is a compiler-tracked placeholder, not a comment. Which call sites are missing a handler for a specific variant? The answers are compiler-verified facts, not heuristic inference.

State machine queries. A resource like Order carries its current state in its type: Order<Approved>, Order<Shipped>. The compiler knows which operations are valid for which states. What transitions are reachable from Order<Created>? What operations apply when the order is in transit? The answers come from the declared lifecycle, queryable and independent of implementation.

Role-based refactoring. An agent told to "add audit logging to every operation that mutates order state" doesn't need to read method bodies. It queries the declared subject role across every operation, identifies the affected call sites, and makes the change. The semantic structure tells it exactly where to edit and what contract to preserve.

In each case, the agent spends fewer tokens reconstructing context and gets closer to a correct edit on the first pass. That's a ceiling that better tooling can't raise if the language never captured the information.

Blip 18 asks how to make agents smarter about existing code. That's the right question for existing codebases, and the Radar's answers are practical.

Ruuk is exploring a different one: what happens when the language itself captures enough domain semantics that agents — and humans — can reason from declarations alone?

That's a bet on language design, not tooling. Ruuk is in early development, and none of this has been tested at scale with real agents. But I believe the overhead pays for itself: what Ruuk asks you to declare is what you already know. The cost isn't in knowing your preconditions and state machines. It's in not having a compiler that checks them.

The language design is documented on GitHub, where I'm using Discussions to think through design decisions in the open. I also write about Ruuk's design rationale here on dev.to.

Top comments (1)

Erik Sargent • Apr 19

Super interesting idea Mat! The function signature becomes high-density context for the agent so it actually solves two problems - one is the issue of context and intent that you've pointed out, but also this dense context helps solve the needle-in-the-haystack issue for large codebases since the intent is always with the function and doesn't have to be searched for within the context window.

I also like it because we often memorialize intent and behavior in unit tests. But those live in a separate code file and a separate compilation pathway. It's easy to ignore and even with good test coverage unintended behaviors and regressions can occur from odd cases.

Modern languages were created for human-productivity. As the role of humans and AI changes so should the tooling. In the era of AI-generated code, the most expensive thing in a codebase isn't boilerplate—it's ambiguity. Languages that force explicit intent eliminate the hallucination and context taxes before they are paid.