DEV Community

JAI LALAWAT
JAI LALAWAT

Posted on

How I Built a Language That Beats C on LZ77 by 6.6 — Bootstrapped from Assembly

Disclosure: This project was built with AI assistance (Claude).

I built a systems programming language called Jda from scratch — zero C, zero Rust,
zero Python anywhere in the toolchain. The compiler is bootstrapped from raw x86-64
assembly, fully self-hosted, and compiles itself to a byte-identical binary.

## The Headline Result

On LZ77 compression (1 MB of data), Jda clocks 277ms against:

  • C (clang -O2): 1,830ms — Jda is 6.6× faster
  • Rust (rustc -O): 2,185ms — Jda is 7.9× faster
  • Go: 2,721ms — Jda is 9.8× faster

Jda is running via Rosetta 2 x86-64 on Apple Silicon — not even native ARM64.

## Full Benchmark Table (Apple Silicon, ms, lower is better)

| Benchmark | C | Rust | Go | Jda |
|---|--:|--:|--:|--:|
| Sudoku — 500 puzzles | 62 | 62 | 66 | 41 |
| LZ77 — 1 MB compress | 1,830 | 2,185 | 2,721 | 277 |
| Regex — 8 pats × 100K | 98 | 221 | 813 | 186 |
| B-Tree — 1M ops | 282 | 297 | 318 | 586 |
| Raytracer — 800×600 | 19 | 21 | 35 | 331 |

Jda wins 3 of 5 benchmarks against native ARM64 compiled languages.

## Why Is LZ77 6.6× Faster Than C?

Two compiler optimizations do the work:

1. MOD→AND strength reduction

The LZ77 hash-chain uses a 4096-entry window. Every iteration computes:
hash % 4096
Jda's compiler rewrites this to:
hash & 4095
This eliminates every IDIV instruction from the inner loop. C, Rust, and Go
all keep the slower division form on this pattern.

2. Loop-Invariant Code Motion (LICM)

The maximum match length and first-byte filter are loop-invariant — Jda hoists
them out of the inner match scan, halving the number of iterations.

## Sudoku: 1.5× Faster Than C and Rust

The hot path is a bitmask scan over candidates. Jda's MOD→AND peephole,
copy propagation, and loop register promotion eliminate redundant work that
gcc/clang keeps in memory.

## Self-Hosting

The compiler bootstraps like this:

Assembly → jda0 → compiles jda1.jda → jda1

compiles jda1.jda → jda1_sh2

compiles jda1.jda → jda1_sh3

byte-identical to sh2

Fixed point converged. 388 conformance tests passing.

## Compile Speed

| | gcc -O2 | rustc -O | go build | Jda |
|--|--:|--:|--:|--:|
| Average | 479ms | 1,497ms | 712ms | 43ms |

33× faster than Rust. Single-pass compiler, no linker, no intermediate files.

## What It Has

  • No GC — manual memory with alloc_pages
  • Goroutine-style green threads + channels
  • Tensors, autograd, neural networks
  • AVX-512 / CUDA / ROCm acceleration
  • 117 stdlib packages
  • Full-stack web framework (Jda Forge)
  • VS Code + JetBrains plugins

## Try It


bash
  curl -sSf https://jdalang.org/install.sh | sh

  - GitHub: https://github.com/jdalang/jda-lang
  - Website: https://jdalang.org
  - Benchmarks: https://jdalang.org/benchmarks/complex-benchmarks/

  All benchmark source code is in the repo — six implementations of each problem
  (C, Rust, Go, Jda, Ruby, Python) side by side.

  Would love feedback on the language design, benchmark methodology, or anything
  that feels wrong or missing.

Enter fullscreen mode Exit fullscreen mode

Top comments (0)