Grotto: Where Neander Programs Live

#ai #agents #programming #neander

Last time, in Neander: An Agent-First Programming Language, I published the language but not the one thing a host actually needs in order to put it to work: a runtime reference implementation. I promised it already existed, and that I would show it next time.

Here it is.

It's called Grotto. The naming keeps the theme going: the first Neanderthal fossil was pulled from a grotto — the place "where the Neanderthal lived." Grotto is where Neander programs live and run.

A specification is a plan. A reference implementation is the proof that the plan can be put into action. Grotto is that proof: an embeddable TypeScript library, running on Node.js, that takes a Neander program as plain text and runs it — the whole language, end to end, every expression, every data type and every built-in function the spec defines.

The trusted component

Recall the setup from the last two posts. The agent is on the outside, untrusted, writing small programs. The host is on the inside, with the APIs worth calling. Between them sits the runtime — and the runtime is the one component in the whole arrangement that everyone has to trust.

That is a heavy crown to wear. Grotto's entire design is an argument that the trust is warranted.

It starts with the design process itself: Grotto is architected, not vibe coded. I am a certified software architect (iSAQB® CPSA-Advanced Level), so on a good day I even know what I am doing. The Grotto implementation started with the creation of an architecture specification (arc42, C4) and a technical design document (co-authored by me and an agent). Only then did a coding agent create the codebase.

It continues with the dependencies — or rather their absence. Grotto has zero runtime dependencies and leans on nothing but the Node built-ins. That is not housekeeping; it is a security boundary. An npm package you never install is a package that can never turn on you: no CVE, no compromised maintainer, no supply-chain attack can reach Grotto through a dependency, because there is not one to reach it through. Which leaves only Grotto's own code, and its design does the rest of the arguing.

The host-facing library is a small dispatcher of under two hundred lines, and it holds no language code at all. Everything that actually touches a stranger's program — the lexer, the parser, the validator, the interpreter — runs somewhere else entirely: in a fresh worker thread, spawned for that one submission and thrown away the moment it's done. Isolation by construction, not by good manners. Add it all up, and the only Grotto code that ever executes on the embedding application's own thread is that small dispatcher — everything else is quarantined in an isolated worker you can kill.

Every program runs under the hard ceilings of a budget system as mandated by the Neander spec — on computation, on memory, and on wall-clock time. Overrun the time limit and the worker is simply killed; the dispatcher keeps the clock, so even a wedged program can't outlast it. The language itself has no recursion and no unbounded loops, so termination was never in doubt to begin with — the budgets are there for everything else.

The last line of defense is quality assurance. The Grotto specifications and codebase so far have been reviewed by several frontier coding models: Opus 4.6–4.8, GPT-5.5, and, by sheer luck, Fable 5.

The codebase quality is verified by over 1,300 unit tests (coverage > 90% but let's not get overexcited about percentages alone) and more than 750 black-box end-to-end tests (program submissions) that are continuously grown toward a Neander conformance test suite.

You can always do more, and I will. But the groundwork is laid — enough, I'd hope, that anyone weighing Grotto for a real host application can take it, and Neander with it, seriously.

It's early days, to be clear. The spec is still a draft, the version number starts with a zero, and interfaces might move. But it's real, and it all runs today — a runtime, not a roadmap.

What you do with it

If Neander reads strangely because only agents write it, Grotto reads normally, because only humans host it. You embed the library, hand it your own APIs as provider modules, and point agents at it:

import { Runtime } from 'grotto1';

const runtime = await Runtime.start({
  config: {
    neanderVersion: 1,

    // The ceilings every program runs under.
    thalerBudget: 5000,          // computation
    memoryBudgetKb: 2048,        // memory
    maxDurationMs: 10_000,       // wall-clock time
    perCallTimeoutMs: 3000,      // per individual API call
    maxProgramSizeBytes: 65_536, // static caps, checked before a program runs
    maxNestingDepth: 64,
    maxRepeatLimit: 1000,

    // Your APIs — each a module the runtime hosts on your behalf.
    apiProviderModules: ['/app/providers/bookings.mjs'],
  },
});

const responseEnvelope = await runtime.submitProgram(programFromAgent);

Every submission comes back as a single structured envelope — the value it returned, or the precise way it failed: a program that didn't type-check, a call that errored, a budget that ran out. One program in, one well-formed answer out, every time. Nothing leaks, nothing hangs.

Field Notes from the Grotto

There is far more to cover. How an agent discovers your APIs at runtime. The budget system that keeps a program from ever running up a bill. The worker isolation that lets you run a stranger's code at all. Each is worth a post of its own — so I'm starting a new series on the Neander language and its runtime, feature by feature.

In the meantime, Grotto is on GitHub, the license is permissive, and the floor is open. Embed it into your app, and let me know what breaks.