Neander and Grotto: Beyond Code Mode

#ai #agents #programming #neander

Field Notes from the Grotto starts here — a feature-by-feature tour of the Neander language and its runtime, Grotto. And I am opening with the biggest feature of them all: Neander itself. Why create a whole new language when we already have an abundance of well-established programming languages at our disposal?

In a previous post I have made the case that the seam between systems is turning into a language — that instead of calling your tools one at a time, an agent should send you a small program and let it orchestrate the work on your side. That idea has a name — code mode — and it is not mine. By now it is not even contentious: others arrived at it from their own directions, there are real solutions already shipping it, and the underlying claim — that a model does better writing code than emitting tool calls — has been measured, not just asserted. So the what is settled. This post is about a narrower quarrel with the how.

All existing solutions I came across share an answer that is, frankly, the obvious one. Take a well-known language the model already writes fluently, generate an API from your tools, and run the agent's code in a sandbox. Pragmatic. Available today. And there is serious effort behind making it safe — an entire industry of ways to run untrusted code: lightweight virtual machines, isolated containers, syscall firewalls, network proxies whose sole job is to say no. Real engineering, and it delivers.

So why did I not reach for any of that? Why start from an empty grammar instead?

Because all of it shares one shape — safety by subtraction — and I wanted a different one.

Safe by construction

Every one of those approaches begins with a language that can do anything, then spends its effort taking things away: walling off the filesystem, blocking the network, killing the process when the clock runs out. The language is a threat, and safety comes from a prison built around this culprit.

Neander is no such threat. It cannot run forever — it is not Turing-complete, there is no recursion, every loop is statically bounded, and termination is decided before the program runs. It cannot reach out — there is no file, no socket, no system call anywhere in the grammar. It cannot run up a bill — every program runs under hard ceilings on computation, memory, and time. Whole categories of exploit — sandbox escapes, privilege escalation, data exfiltration — simply do not apply, because the capability they would abuse was never there.

There is no prison because there is no prisoner. The sandbox approach asks you to trust the cage. Neander's safety is the absence of anything that would need a cage. The less a language can do, the less can go wrong — and the less you have to take on faith. The entire sandbox industry exists to contain general-purpose code; Neander opts out of needing it.

Uniform by construction

No matter what program the agent submits, the answer comes back in the same form: a single response envelope, defined by the language itself. It carries either the value the program produced or a precise account of why it produced none. That uniformity holds because failure is never allowed to escape into the mess: errors raised mid-run and exhausted budgets are caught and classified rather than left to surface however they please, and even an invalid program that never executes still produces a response. An envelope also carries metadata — among it the resources used by the program or the usage limits the runtime enforces. The agent learns the very ceilings it operates under from the responses it receives.

The uniform response envelope is not simply a convenience. The whole point of sending code instead of a stream of tool calls was to keep the agent's context clean — one compact result in, rather than every intermediate step piling up. A uniform envelope is what makes that payoff real: the agent gets back a single machine-readable verdict it always knows how to read, and only that verdict costs it any context. It never has to parse prose, squint at a stack trace, or reconcile different failure formats. It reads the envelope, and it knows exactly where it stands.

Open by construction

The existing solutions tend to arrive bolted to something — a cloud you deploy on, a framework you adopt, a tool protocol you have to speak. Neander is a specification, with a conformance suite growing up beside it. Anyone can implement a runtime; Grotto is simply the first. Nothing ties you to one vendor, and nothing ties you to one tool protocol — the host embeds a Neander runtime by wiring it into its own application. A standard you can build on, not a product you sign up for.

Next from the Grotto

That is the case in outline. The rest of the series is the case in detail — one feature per entry, each of them a piece of the argument above made concrete. First up, the one that makes the whole inversion possible: how an agent finds out what your APIs even are, at runtime, without ever carrying a catalog of them around.

In the meantime, read the Neander spec, embed Grotto in your own app, and tell me where it falls short.