How I Built a Language That Beats C on LZ77 by 6.6 — Bootstrapped from Assembly

#claude #systems #ai #programming

Disclosure: This project was built with AI assistance (Claude).

I built a systems programming language called Jda from scratch — zero C, zero Rust,
zero Python anywhere in the toolchain. The compiler is bootstrapped from raw x86-64
assembly, fully self-hosted, and compiles itself to a byte-identical binary.

## The Headline Result

On LZ77 compression (1 MB of data), Jda clocks 277ms against:

C (clang -O2): 1,830ms — Jda is 6.6× faster
Rust (rustc -O): 2,185ms — Jda is 7.9× faster
Go: 2,721ms — Jda is 9.8× faster

Jda is running via Rosetta 2 x86-64 on Apple Silicon — not even native ARM64.

## Full Benchmark Table (Apple Silicon, ms, lower is better)

| Benchmark | C | Rust | Go | Jda |
|---|--:|--:|--:|--:|
| Sudoku — 500 puzzles | 62 | 62 | 66 | 41 |
| LZ77 — 1 MB compress | 1,830 | 2,185 | 2,721 | 277 |
| Regex — 8 pats × 100K | 98 | 221 | 813 | 186 |
| B-Tree — 1M ops | 282 | 297 | 318 | 586 |
| Raytracer — 800×600 | 19 | 21 | 35 | 331 |

Jda wins 3 of 5 benchmarks against native ARM64 compiled languages.

## Why Is LZ77 6.6× Faster Than C?

Two compiler optimizations do the work:

1. MOD→AND strength reduction

The LZ77 hash-chain uses a 4096-entry window. Every iteration computes:
hash % 4096
Jda's compiler rewrites this to:
hash & 4095
This eliminates every IDIV instruction from the inner loop. C, Rust, and Go
all keep the slower division form on this pattern.

2. Loop-Invariant Code Motion (LICM)

The maximum match length and first-byte filter are loop-invariant — Jda hoists
them out of the inner match scan, halving the number of iterations.

## Sudoku: 1.5× Faster Than C and Rust

The hot path is a bitmask scan over candidates. Jda's MOD→AND peephole,
copy propagation, and loop register promotion eliminate redundant work that
gcc/clang keeps in memory.

## Self-Hosting

The compiler bootstraps like this:

Assembly → jda0 → compiles jda1.jda → jda1
↓
compiles jda1.jda → jda1_sh2
↓
compiles jda1.jda → jda1_sh3
↑
byte-identical to sh2

Fixed point converged. 388 conformance tests passing.

## Compile Speed

| | gcc -O2 | rustc -O | go build | Jda |
|--|--:|--:|--:|--:|
| Average | 479ms | 1,497ms | 712ms | 43ms |

33× faster than Rust. Single-pass compiler, no linker, no intermediate files.

## What It Has

No GC — manual memory with alloc_pages
Goroutine-style green threads + channels
Tensors, autograd, neural networks
AVX-512 / CUDA / ROCm acceleration
117 stdlib packages
Full-stack web framework (Jda Forge)
VS Code + JetBrains plugins

## Try It


bash
  curl -sSf https://jdalang.org/install.sh | sh

  - GitHub: https://github.com/jdalang/jda-lang
  - Website: https://jdalang.org
  - Benchmarks: https://jdalang.org/benchmarks/complex-benchmarks/

  All benchmark source code is in the repo — six implementations of each problem
  (C, Rust, Go, Jda, Ruby, Python) side by side.

  Would love feedback on the language design, benchmark methodology, or anything
  that feels wrong or missing.

DEV Community

How I Built a Language That Beats C on LZ77 by 6.6 — Bootstrapped from Assembly

Top comments (0)