Roman Melnikov

Posted on May 18

The 2,600-Line Compiler That Compiles Itself and Emits F#, TypeScript, Python, Java, and C#

#compilers #fsharp #typescript #opensource

The 2,600-Line Compiler That Compiles Itself and Emits F#, TypeScript, Python, Java, and C

ll-lang started as a language for LLM code generation. The more interesting update now is not the syntax pitch. It is the compiler story: the compiler is written in ll-lang itself, the bootstrap reached a fixpoint, and the same source can emit several host targets.

That gives the project two concrete proofs:

compiler₁.fs == compiler₂.fs
one source language can emit F#, TypeScript, Python, Java, C#, and an experimental LLVM backend

That combination moves ll-lang out of the "interesting syntax demo" category.

Why the self-hosting milestone matters

A lot of language projects can parse a toy file. Fewer can move their own compiler into the language they are building.

The ll-lang repo now documents bootstrap as complete. The canonical compiler lives in the self-hosted path, while archived stage0 code stays under obsolete/stage0 only for recovery diagnostics. The README makes the claim explicit: compiler₁.fs == compiler₂.fs.

That fixpoint matters because it means the compiler output stabilizes under its own pipeline. It is not just "the compiler can probably compile itself." It is "the compiler compiles itself, and the emitted artifact stops changing."

For a project aimed at LLM workflows, that matters even more. The promise is not just smaller syntax. The promise is a closed, testable toolchain.

The bootstrap path

The documented bootstrap path is intentionally simple:

git clone https://github.com/Neftedollar/ll-lang.git
cd ll-lang
./tools/bootstrap-self.sh install
BOOTSTRAP_BIN="$(./tools/bootstrap-self.sh path)"
"$BOOTSTRAP_BIN" check "$PWD/lllcself/src/Main.lll"

The pinned artifact is verified against bootstrap/lllc-bootstrap.lock.json, so bootstrap does not depend on a lucky local setup. On success the compiler reports OK and exits 0.

There is also a strict launcher:

./tools/lllc-bootstrap.sh check "$PWD/lllcself/src/Main.lll"

That wrapper runs the pinned bootstrap path directly instead of silently falling back to older bridge behavior.

The multi-target path

Self-hosting is only half the story. A compiler that only proves it can emit itself is interesting to compiler people. A compiler that can also target multiple ecosystems is easier for application teams to try.

ll-lang exposes that through the same CLI surface:

lllc build --target fs   shapes.lll
lllc build --target ts   shapes.lll
lllc build --target py   shapes.lll
lllc build --target java shapes.lll
lllc build --target cs   shapes.lll
lllc build --target llvm shapes.lll

The repository tutorial shows the same algebraic data type mapped into several host languages:

F# discriminated unions
TypeScript tagged unions
Python @dataclass + pattern matching
Java 21 sealed interfaces + records
C# records

That is the practical angle. The source language stays compact for prompt-driven authoring, while the output lands in ecosystems teams already run in production.

One source, multiple outputs

The multi-target tutorial uses a simple source shape:

module Shapes

Shape = Circle Float | Rect Float Float | Empty

area(s Shape) Float =
  | Circle r  -> 3.14159 * r * r
  | Rect w h  -> w * h
  | Empty     -> 0.0

From there, ll-lang emits target-specific code instead of flattening everything to one runtime model. That is what good cross-target compilers do: preserve the source idea while producing code that still looks native enough for the host ecosystem.

What I would try first

If you want the fastest way to evaluate the project:

Read the README bootstrap section.
Open lllcself/src/Main.lll to see that the compiler entrypoint is in ll-lang.
Read docs/tutorials/04-multi-target.md and compare the emitted TypeScript, Python, and Java shapes.
Run the bootstrap installer on a clean machine.

That sequence shows both halves of the project without requiring a deep compiler-internals read first.

Why this matters for LLM workflows

The original thesis behind ll-lang is that a smaller, statically checked syntax reduces noise in prompts and makes diagnostics machine-readable. The built-in MCP server ships with the toolchain for agent use.

What changed is that the implementation now backs up the thesis. The compiler is self-hosting. The stdlib is self-hosted. The project system is real. The codegen path is not locked to one runtime.

That is a stronger story than "language for LLMs." It is a claim with compiler evidence behind it.

What the line count actually is

The self-hosted compiler CLI under lllcself/src/ is 2589 lines today. So "about 2600 lines" is the accurate version of the headline, even if "under 3000 lines" is the broader takeaway.

What this milestone does not mean

It does not mean every part of the stack is finished forever. The repo is explicit that llvm is still experimental, and that some self-host routing work is still documented as controlled rollout. That is a positive signal, not a weakness. Compiler projects get more trustworthy when they mark their edges clearly.

The short version

The easy headline for ll-lang would be "a typed language for LLMs."

The better headline now is narrower and more technical:

ll-lang reached a self-hosting bootstrap fixpoint, and the same language can already emit multiple mainstream targets.

That is the kind of milestone that turns a positioning story into an implementation story.

GitHub: https://github.com/Neftedollar/ll-lang
Landing page: https://neftedollar.com/ll-lang/
Earlier intro post: https://dev.to/neftedollar/why-we-built-ll-lang-a-statically-typed-functional-language-for-llms-2hg8

DEV Community