John Samuel

Posted on Feb 24

From Multilingual Syntax to Multilingual Runtimes: Taking a Human-Language-First Language to the Web

#webdev #webassembly #showdev #programming

In my previous post, I introduced multilingual, an experimental programming language where the same abstract syntax tree can be written in English, French, Spanish, and other human languages while preserving identical semantics. That work focused on the surface of the language: how we let people read and write code in the language they are most comfortable with, without changing what the program does.

In this follow-up, I want to go one step further and ask: what happens when you bring this idea to the Web and to WebAssembly (WASM)?

Can a single semantic core, expressed in multiple human languages, also have a single, portable runtime story?
How far can we push this idea before “multilingual” stops being just a toy language and starts looking like a serious experimentation platform for human-language–aware tooling?

To explore these questions, I’ve been experimenting with compiling the multilingual interpreter and tooling to WASM and exposing it through a browser playground.

Project repository: https://github.com/johnsamuelwrites/multilingual
Online playground: https://johnsamuel.info/multilingual/playground.html

Recap: One Semantic Core, Many Human Languages

The core idea of multilingual is simple to state but surprisingly rich to work with: you write code in different human languages, but it all maps to the same underlying AST and semantics.

In practice, that means:

There is a single, language-agnostic core language: control flow, expressions, and data structures live here.
Human languages appear as mappings from localized keywords and identifiers to that core: for example, if / si / si (English/French/Spanish) all map to the same conditional node.
The interpreter never “cares” whether your source was French or English; by the time it runs, it is operating on the unified AST.

This separation is what enables multilingual to be more than a localized syntax layer on top of an existing language: it is an experiment in treating human language as a first-class dimension of the programming environment itself.

Why WebAssembly Matters for a Multilingual Language

If you have a language whose goal is to welcome developers in multiple human languages, you want the runtime to be equally portable.

WebAssembly is particularly interesting in this context because it gives us:

A sandboxed, predictable execution model that runs in modern browsers and outside the browser in WASI environments.
A well-defined binary target for compilers and interpreters written in languages like C, C++, and Rust.
A path to reuse the same core, even if the surface syntax varies dramatically across human languages.

For an experimental language like multilingual, WASM offers two big advantages:

You can ship the whole interpreter and tooling directly into the browser, so learners only need a URL, not an installation guide.
You can treat the Web as a live laboratory to test multilingual error messages, editor integrations, and teaching scenarios.

In other words, if the goal is to make programming more approachable in more human languages, WASM lets us meet people where they already are: in the browser.

The Playground: A Multilingual Interpreter in Your Browser

The playground is the first concrete step in that direction: it loads the language core in the browser and lets you write and run programs in different human languages side by side.

Playground: https://johnsamuel.info/multilingual/playground.html

At a high level, the architecture looks like this:

The core interpreter is compiled to WebAssembly using a systems language toolchain.
The browser page handles text input, language selection, and UI in plain HTML/CSS/JavaScript.
When you run code, the UI sends your multilingual source to the WASM module, which parses it, builds the AST, and executes it.

This separation mirrors the language design:

The UI is responsible for dealing with the user’s language preferences, fonts, directionality, and input quirks.
The WASM module is responsible for understanding the core language and enforcing identical semantics regardless of human language.

Because the interpreter runs entirely in the browser, you also get some nice side-effects:

No server-side execution is required, which simplifies deployment.
You can experiment with new keyword mappings and language packs and immediately see their behavior.
The environment is inherently sandboxed, piggybacking on the browser’s security model.

Multilingual Error Messages and Diagnostics

Once you have a language that can be written in multiple human languages and a runtime that can be delivered as WASM, the next natural step is tooling.

In a traditional language, error messages are often an afterthought. In a multilingual language, they become central:

Syntax errors can be reported in the same human language as the source code.
Error hints can guide beginners using vocabulary that makes sense in their linguistic context.
In mixed-language codebases, you can imagine selectively switching diagnostic languages without changing the underlying program.

A WASM-based playground is a great place to prototype this because:

You can build a small, interactive REPL where users see errors in real time.
You can experiment with toggling error-message languages without reloading or recompiling.
You can embed teaching material directly in the browser, close to the diagnostics.

From an implementation standpoint, this suggests a clear design direction:

Keep error codes and structures language-agnostic in the core.
Maintain localized message catalogs and explanations in separate resources, possibly keyed by the same identifiers used for keywords.

Thinking Ahead: Multilingual Tooling, IDEs, and Beyond

WASM is not just a deployment target; it is a bridge between language runtimes and modern developer tools.

Once you have the multilingual core compiled to WASM, several possibilities open up:

Embeddable interpreters in online textbooks and tutorials, where each reader can choose their preferred human language.
Language-server–like functionality running in the browser, providing autocompletion and diagnostics for multilingual code.
Integration with “traditional” editors via a WASI runtime, so the same core can power both web and desktop experiences.

This also connects to broader work on multilingual programming and natural-language–friendly programming environments. The long-term question is not just “can we localize keywords?” but “what does a programming ecosystem look like when human language is no longer an afterthought?” stackoverflow

What’s Next?

A few directions I am exploring and would love feedback on:

Richer language packs: extending and experimenting with scripts and writing systems that challenge our assumptions about syntax.
More powerful WASM integrations: exploring how far we can push the browser-based environment (e.g., file-like APIs via WASI, richer visualization of program state).
Community-contributed mappings: making it easier for contributors to define and share mappings for their own languages via simple configuration files in the repository.

If you are interested in any of these ideas, want to try the language, or have thoughts on making programming more inclusive across human languages, you can:

Experiment in the playground: https://johnsamuel.info/multilingual/playground.html
Read the original post introducing the language: https://dev.to/jsamwrites/a-multilingual-programming-language-where-the-same-ast-runs-english-french-spanish-etc-23bd reddit
Browse or contribute to the source: https://github.com/johnsamuelwrites/multilingual

I would love to hear how you imagine using a human-language–first programming environment in your own work or teaching.

Top comments (2)

Osama Alghanmi • Feb 25 • Edited

This resonates with what we've been building with our programming language Almadar.

You're solving the problem at the syntax layer: one semantic core, many human-language frontends, same AST. We've been solving what turns out to be the same structural problem at the application layer: one semantic core, many platform projections, same intermediate representation.

The Shared Insight: Separate Meaning from Surface

Your architecture is French keywords → AST → execution and English keywords → AST → execution, where the AST is the stable semantic core that doesn't care which human language produced it.

Ours is .orb schema → OIR → TypeScript app and .orb schema → OIR → Python app, where the OIR (Orbital Intermediate Representation) is a language-agnostic IR that captures entities, state machines, guards, effects, and event flows — and doesn't care which platform consumes it.

Your model:
  French source  ─┐
  English source ─┤→ Shared AST → Interpreter → Same behavior
  Spanish source ─┘

Our model:
  Same .orb schema → OIR → TypeScript backend → React + Express app
                         → Python backend    → Pydantic + FastAPI app
                         → Rust backend      → Axum + egui app

In both cases, the semantic core is the invariant. The surface — whether that's human-language keywords or platform-specific code — is a derived projection.

S-Expressions as the Universal Notation

Here's where it gets interesting. We also chose a Lisp-derived notation for our semantic core — S-expressions encoded as JSON arrays:

["and",
  ["=", "@entity.status", "review"],
  [">", ["count", "@entity.sections"], 0]]

This is the JSON equivalent of (and (= entity.status "review") (> (count entity.sections) 0)). We chose it for the same reason S-expressions have survived since 1958: they're homoiconic — the representation is the AST. No parser ambiguity, no precedence rules, no surface syntax to disagree about. The Rust compiler validates them, the TypeScript runtime evaluates them, and an AI agent can reason about them — all operating on the same structure.

Your multilingual project demonstrates the same principle from the other direction: if you strip away the surface keywords and word order, what remains is a universal semantic structure that any frontend can target.

Where This Converges: LLM-Native Development

Your point about vibecoding becoming genuinely multilingual is exactly right—and it connects to something we're seeing in practice.

In our system, an AI agent can:

Take a natural language description (in any language)
Generate a .orb schema (which is already language-neutral JSON)
Compile it into a working app on any target platform

The schema is the semantic bridge. The agent doesn't generate TypeScript or Python — it generates the meaning (entities, state machines, transitions, effects), and the compiler projects that meaning onto whatever platform you need.

Your system does the analogous thing: an LLM generates code using French keywords, and multilingual executes it directly because the semantic core doesn't care about the surface language. No translation step.

Both approaches point to the same future: the semantic layer is the stable contract, and everything above it (human language, platform language) is a projection that can vary freely.

The WASM Angle

Your move to compile the interpreter to WASM is interesting because it collapses the deployment question entirely — the semantic core runs anywhere a browser does, no install needed. That's a powerful property for a teaching tool.

We solve the same "run everywhere" problem differently: the same schema gets compiled ahead of time into platform-native code (TypeScript for the web, Python for ML backends, Rust for native). The semantic core doesn't travel as a portable binary — it travels as a portable specification (the OIR) that each backend interprets at compile time. Different strategy, same goal: write the meaning once, run it wherever you need to.

It would be interesting to see whether WASM becomes the universal runtime for these kinds of semantic cores, or whether ahead-of-time compilation to native targets wins out. Probably both, depending on the use case.

A Design Question

Keywords translate cleanly — si for if, fonction for function — because they're part of the language spec and you control both sides of the mapping. But user-defined names don't have that property. If a developer writes a function called calculer_total in the French frontend, what does a developer using the English frontend see when they import it?

This seems like the hard problem in multilingual PL design. You can make the language multilingual. Still, the moment developers name their own functions, variables, and types, the code becomes monolingual again — in whichever language the author happened to think in. In a team where one person writes French identifiers and another writes Japanese ones, the shared AST is still shared, but the readability fragments.

Curious whether you've thought about this — whether the frontend layer could eventually carry identifier aliases or annotations alongside the AST, so the same function could surface as calculer_total in one frontend and calculate_total in another while pointing at the same node.

On Community-Contributed Mappings

Of your "What's Next" directions, the community-contributed language mappings resonate most with us — because we've already built the infrastructure for a version of the same problem.

Our system has a pattern registry — a JSON configuration file that maps each semantic pattern type (e.g., entity-table or form-section) to a concrete React component (in our almadar/ui package). with its props schema and event contracts declared alongside it. Contributors can add new patterns without touching the compiler or runtime. The registry is the contract; the implementation is pluggable.

{
  "entity-table": {
    "component": "DataTable",
    "props": { "columns": "string[]", "entity": "string" },
    "eventContract": {
      "emits": [{ "event": "EDIT", "trigger": "action" }]
    }
  }
}

Your language mappings are structurally similar — a configuration file that maps human-language keywords to semantic core operations, contributed by the community, without forking the interpreter. The shape of the problem is the same: how do you let a community extend the surface layer via config, while the semantic core stays stable and validated?

The thing we learned the hard way: you need a validation layer between the contributed config and the core. Without it, someone contributes a mapping that's syntactically valid but semantically broken (maps a keyword to the wrong AST node, misses an edge case in their language's grammar). We built compile-time validation that checks every pattern entry against the event contracts. If the mapping declares it emits EDIT but the underlying component doesn't wire that event, the compiler catches it.

If you're building a contribution pipeline for language packs, that validation step — "does this mapping actually produce correct AST nodes for all the constructs it claims to cover?" — might be worth prioritizing early. Happy to share how we structured ours if it's useful.

Really compelling work. The fact that "one semantic core, many surfaces" is emerging independently in both PL design and application architecture feels like something important is converging.

John Samuel • Feb 25

In the current design, user-defined identifiers (like function names) live in the shared AST exactly as they are authored, so an English-frontend developer importing code written with French identifiers will still see calculer_total as the function name. I will explore LLMs in the future for this, but this is not the goal.

Contributors only edit config that says “this token/grammar pattern means that AST node,” and an automated pipeline checks those mappings against a canonical test suite. If a pack ever produces an invalid or surprising AST, it fails validation and never reaches the interpreter. That way the surface layer can evolve socially, while the semantic layer stays mechanically stable and checkable.

Thanks for your feedback. Interesting to see that you are also exploring another problem in almost similar way.