DEV Community

John Samuel
John Samuel

Posted on

A multilingual programming language where the same AST runs English, French, Spanish, etc.

What if the only thing that changed between English and French code was the surface syntax, not the interpreter?

tl;dr

  • 1. One tiny semantic core, many language-specific “frontends”.
  • 2. Code written in French, English, etc. maps to the same AST and runtime.
  • 3. You can swap surface languages without changing the underlying program.

Most programming languages quietly assume that you think in English.

Keywords, error messages, tutorials, library names – everything is shaped around one language, even when the underlying semantics are universal.

I’ve been experimenting with a different approach: keep a tiny, shared semantic core, and let the surface syntax (keywords, word order) adapt to the programmer’s natural language.

The result is an interpreter where the same AST can execute code written in French, English, Spanish, Arabic, Japanese, and more – without a translation step.

Repository: https://github.com/johnsamuelwrites/multilingual


Why separate surface syntax from the core?

When we teach or learn programming, we often say “syntax is just sugar, what really matters is semantics.”

But in practice, syntax is where beginners feel the most friction – especially if they’re also learning English at the same time.

The idea here is to pull that intuition to the extreme:

  • Keep one small semantic core (conditions, functions, variables, blocks, etc.).
  • Describe each human language as a thin “frontend” that maps its own keywords and patterns into that core.
  • Make these mappings explicit and inspectable, instead of hard-coding English everywhere.

This lets you ask questions like: what would this program look like if all the keywords were in French or Spanish? – while still running through the same interpreter pipeline.


How the interpreter is structured

How the interpreter is structured
At a high level, the interpreter is split into three layers:

  • Surface syntax layer – Knows about keywords and word order for a specific human language (e.g., if vs si, print vs afficher). It only parses and normalizes, it never executes.
  • Semantic core – A small, language‑agnostic core with expressions, statements, control flow, function calls, and environments. This is where the AST lives.
  • Runtime / execution engine – Walks the core AST and evaluates it. At this point, the runtime is completely blind to the human language you used.

Each surface syntax module is responsible for normalizing its input into the same core representation.

That’s where most of the interesting language-design questions arise.


A tiny example

Suppose you want to express a simple conditional and a loop.

In an English-shaped surface syntax, you might write something close to Python-like pseudocode:

if x > 0:
    print("positive")
Enter fullscreen mode Exit fullscreen mode

In a French surface syntax, the structure is analogous but with French keywords (this is illustrative – see the repo for the real examples):

si x > 0:
    afficher("positif")
Enter fullscreen mode Exit fullscreen mode

Both of these are parsed into the same core AST node: a conditional expression with a predicate and a block.

The runtime never needs to know whether the original keyword was if or si.

The same idea extends to loops, function definitions, and so on.


Another tiny example

Suppose you want to write a function that prints numbers from 1 to 3 if a flag is set.

English-shaped surface syntax

if enabled:
  for n in 1..3:
    print(n)
Enter fullscreen mode Exit fullscreen mode

French-shaped surface syntax (illustrative)

si actif:
  pour n dans 1..3:
    afficher(n)
Enter fullscreen mode Exit fullscreen mode

Both programs are parsed into the same core AST: a conditional node whose body contains a loop over a range and a call to a print function. The runtime never needs to know whether the original keyword was if or si.

Why this might be useful

This is still an experiment, but a few potential use cases are emerging:

  • Teaching in non-English contexts

    Instructors can show code with keywords in the students’ own language, while still having a single, shared semantic model under the hood.

  • Research on multilingual PL design

    It becomes easier to compare how different natural languages “want” to express the same control structures, and where word-order differences start to matter.

  • Accessibility and experimentation

    People can prototype their own keyword sets or dialects without forking the whole interpreter, as long as they can map back to the core.


Open questions and limitations

There are plenty of hard questions I’m still exploring:

  • How far can you push word-order differences before the surface layer becomes too complex?
  • How do you keep each mapping semantics-preserving, and how do you test that rigorously?
  • Where do you draw the line between “just syntax” and constructs that really should live in the core language?

Right now, the project is deliberately small and experimental; the focus is on keeping the core minimal and making the mappings explicit, rather than covering every possible language feature.


How to try it, and how you can help

If this idea interests you, you can:

  • Check out the repo: https://github.com/johnsamuelwrites/multilingual
  • Run the examples and look at how the surface syntax modules map into the core.
  • Open issues or PRs with:
    • new language mappings,
    • critiques of the core design,
    • ideas for better ways to specify and test the semantics.

Top comments (0)