Handwriting code is the new cursive. AI agents write code competently, and they're improving fast. My recent agentic work spans refactoring C++ machine learning libraries, writing CLIs in Rust, building web apps in ASP.NET, and shipping mobile apps in Flutter. For the mechanical parts — scaffolding, boilerplate, repetitive transformations — agents handle it well. I've used them to write entire features on their own as well.
It makes you wonder, if the agent writes the code, is the language is an implementation detail — and the intuition makes sense: why care about syntax you'll never type? But the language isn't just what the agent writes in. It's what the compiler checks, what the human reviews, and what determines how fast the feedback loop closes. Agentic coding raises the stakes for language design, and points toward specific properties a language should have.
Better Compiler Feedback Multiplies Agent Productivity
My agentic coding experience has varied by language. Part of that is training data — AI models have seen more code in some languages than others. But the larger factor is whether mistakes are caught at compile time or runtime, which determines how quickly the loop closes.
In Rust, when an agent generates something wrong, the compiler identifies the exact location, names the violated constraint, and usually suggests a fix. The agent iterates on that feedback directly — no execution required. When checks happen at runtime instead, there's an extra round-trip: generate code, execute tests, parse results, feed output back to the agent. More wall time per iteration, more tokens spent on test output instead of code. Same agent, longer loop, higher cost per correction. And those runtime checks are only as good as the inputs that exercise them. Communities built around runtime-checked languages know this well — it's why they invest heavily in defensive testing, property-based tools like Hypothesis, and comprehensive test suites. But even thorough tests depend on the paths you think to exercise. A compile-time check flags the error unconditionally.
This isn't a judgment on any particular language — it's a property of where in the cycle errors surface. The industry has started redesigning CLI tools for agents — Trevin Chow's seven principles for agent-friendly CLIs captures the pattern: structured output, unambiguous interfaces, clear error signals. The same thinking applies to compilers. The compiler is the agent's primary feedback surface — and most compilers were designed before agentic coding existed.
Better compiler output is the starting point. The deeper question is what classes of mistakes the compiler can catch. The best current compilers handle memory errors (Rust), type mismatches (TypeScript), and null dereferences (Kotlin). Entire categories of domain mistakes remain invisible: unhandled operation outcomes, invalid state transitions, missing field mappings when converting between data representations. These aren't obscure edge cases — they slip past review and surface in production.
The Development Model Needs a Third Party
Our current approach to AI coding is a two-party arrangement: humans describe intent, agents write code. Somebody is missing the party.
The danger isn't that AI writes bad code. It's that humans lose the ability to evaluate what AI writes — and the two-party model accelerates this. The volume of AI-generated code is already outpacing review capacity; the trend is toward reviewing less, not more. That makes the compiler more load-bearing, not less. A smarter AI doesn't close this gap on its own — even the best engineer benefits from code review, because independent verification catches what self-consistency misses. The same principle applies to generated code, except the stakes are higher: the reviewer understands less of the codebase with each generation cycle.
The Rust model points at the answer: compiler-enforced properties rather than runtime hopes. The question is whether we can extend that from memory safety to operational semantics. Expanded compiler checking gives us a basis for trusting AI-generated code — not because the AI earned that trust, but because an independent third party verified the structural claims. Checks and balances: human + compiler + AI, each with a distinct responsibility.
The human defines the operation's shape: its outcomes, state transitions, data projections. The AI generates the implementation body. The compiler stands between them, rejecting anything that violates the declared structure. This changes what a developer needs to review. The programmer's job is to get the declaration right — to ensure the outcome variants are exhaustive, the state transitions are valid, the field projections are accurate. That's the work only a human can do: verifying that the declaration honestly represents the domain. Once that's done, the compiler owns enforcement everywhere.
The languages most of us work in weren't designed for the assumption that you'd be reading tens of thousands of lines generated by someone else, at volume, under time pressure. That's the new reality. The closer a language's constructs map to domain concepts, the less translation the reader's brain performs — and the faster a developer can audit generated code for correctness.
Domain Knowledge Belongs in the Declaration
The common thread between better compiler feedback, the three-party model, and cheaper review is semantic content — how much meaning the language lets you encode, and how much of that meaning the compiler verifies.
Current compilers have limited capability to check domain rules. Your service calls a payment gateway — the response can mean charged, declined, fraud-held, or gateway failure. Most languages give you a status code and a body; whether you distinguish "declined" from "fraud hold" depends on your discipline, not the compiler. A User has twenty fields; an API response should expose five of them. Derive the response type by hand, add a field to User next quarter, and now you're trusting every downstream DTO to have been updated.
The convention of documenting this is everywhere — Javadoc's @throws, Python's Raises, OpenAPI's response schemas. The problem is that none of it is compiler-visible. You can document four outcomes and handle three. The gap stays invisible until production. Here's a typical pattern:
/**
* @throws DeclinedError
* @throws FraudHoldError
* @throws GatewayError
*/
async function chargeCard(
req: ChargeRequest,
gateway: PaymentGateway,
): Promise<Receipt> {
// ...
}
The return type promises a Receipt — that's the happy path. The three failure modes live in a JSDoc comment the compiler will never read. A caller that ignores all three error cases compiles without a warning. The information is there, but it's decorative.
Here is the same operation in Ruuk, a language designed to include this information where the compiler can verify it:
pub op chargeCard =
payload req: ChargeRequest
via gateway: PaymentGateway
outcomes =
| Charged of Receipt
| Declined of DeclineReason
| FraudHold of ReviewId
| GatewayError of ErrorDetail
This declares not just inputs but what role each plays (payload is the data being acted on, via is the external system being called) and what outcomes the caller must handle. If you call chargeCard and don't handle the FraudHold outcome, it doesn't compile. The meaning you would have put in a comment is now visible to the compiler — and enforced by it.
The same declaration that gives the compiler more to verify also makes the code faster to cold-read. A developer scanning AI-generated code can evaluate chargeCard's shape in seconds: what it takes, where the data goes, what can go wrong. No implementation diving required.
This design direction asks developers to formalize what they already know — it's on the whiteboard, in the docs, in comments scattered through the codebase. You already know your payment operation has four possible outcomes. The language asks you to type that in. The compiler takes it from there.
Conclusion
Agentic coding hasn't reduced the importance of language design — it's exposed where it needs to grow. The properties that improve the agentic loop are the same ones that improve human review: more meaning in the syntax, more verification in the compiler, less translation between what the code says and what the domain demands.
That points toward a class of language, not a single answer. Ruuk is my attempt to build one — designed from the start for the world where agents write the code and humans verify the shape. Hopefully it won't be the only attempt, and competition here is genuinely good. The industry needs more people thinking about this problem.
The articles that follow make the design concrete. The next piece is a fast tour of the OCaml/F# syntax Ruuk builds on — enough to read the examples without getting lost. After that: how operations and outcomes give the compiler visibility into failure modes, and how projections enforce structural rules when data crosses boundaries. The chargeCard declaration above is the destination. The series shows how you get there.
Ruuk is pre-alpha. What I can show right now is the thinking behind it — and a place to engage with the design before it solidifies. If the ideas in this article resonates, follow along on GitHub and weigh in on the discussions. The best languages get shaped by the people who care about the problems they solve.
Top comments (0)