Ardon-R2: Inspired by R, Built on Rust. An AI-Assisted Project. v0.1.1.

devendra tandle — Thu, 21 May 2026 10:48:57 +0000

Today I am open-sourcing Ardon-R2, a Rust reimplementation of R's runtime. The repository is live at github.com/devendratandle/Ardon-R2 under AGPL-3.0, with a side-by-side benchmark harness you can point at your own R install. If you write R for a living, it is worth running against your hottest scripts.

Ardon-R2 keeps R's surface — formulas, data frames, lm, t.test, summary, the lot — and rebuilds the engine underneath in Rust. The statistical numerics are bit-identical to CRAN R 4.5.3. The runtime is a single static binary, memory-safe by construction, free of garbage collection, and built on a frozen dependency graph that does not rot. Inspired by R. Built on Rust.

It is also unmistakably neonatal. Faster than R where computation dominates. On par where memory bandwidth does. Missing chunks of S4, parts of R5, the long tail of CRAN. I am shipping anyway because shipping is how a project earns the right to grow.

The rest of this article is a tour of three things: what Ardon-R2 is, what Ardon-R2 does today, and what Ardon-R2 will do next. The story of how it was built belongs in a later piece, after the project has earned the audience to listen to it.

Why a new engine

R is a beautiful language that runs on an engine designed when "multi-core" was an enterprise-server feature, not a phone. The interpreter is single-threaded by default. The garbage collector pauses every workload long enough to feel it. Every df$x <- df$x + 1 allocates a fresh vector because R semantics are copy-on-modify. The package ecosystem is gorgeous and enormous and built almost entirely on C and C++ shared libraries that someone, somewhere, has to keep maintaining.

When that maintenance breaks, packages die quietly.

In 2022, RGtk2 — the package that gave R bindings to GTK2 — was archived on CRAN because nobody could keep it building against a deprecated GUI toolkit. RGtk2 was a dependency of rattle, the data-mining GUI that an entire generation of analysts learned R on. So rattle effectively died. Not because its statistics were wrong. Not because anyone wrote a competing tool. Because a C library three layers underneath it stopped being maintained. That story repeats across the CRAN graph more often than the community likes to talk about.

Ardon-R2's bet is that a runtime built on Cargo's workspace model — frozen dependencies, audited tree, no C/C++ shared-library archaeology, no version-maintenance hell — doesn't have rattle-shaped failures waiting in its future. The language stays expressive. The engine stops being fragile.

What's shipped

v0.1.1, today, with no asterisks:

Statistics. lm, glm (gaussian, binomial, poisson via IRLS), aov, anova, t.test (one-sample, two-sample, paired, Welch), wilcox.test, cor.test, shapiro.test, fisher.test, chisq.test. summary() for every model class. Formula handling with treatment-contrast expansion of factor and character predictors — lm(y ~ Species, data=iris) works the way R users expect it to.

Linear algebra. Matrix, tensor, transpose, crossprod, solve, SVD, eigen, QR, fused least-squares. Backed by tuned numerical kernels, not LAPACK bindings.

Summary-stat kernels. Parallel prefix scan, quickselect-based nth-smallest, binary-heap top-k, deque-based O(n) rolling max/min, rolling mean/sum/sd, hash-aggregate, pairwise distance. Each kernel has scalar, SIMD, and parallel variants chosen at dispatch time.

Data frames. Position, name, and logical subsetting. $ access and assignment. names, nrow, ncol, dim. Datasets iris, mtcars, airquality, ToothGrowth, faithful shipped in a native binary format (.r2d) — bit-identical to CRAN R 4.5.3 values, verified by integrity tests on every build.

Machine learning. k-means, k-nearest-neighbors, decision trees, random forests, naive Bayes, and a basic neural network with manual forward and backward passes. Not deep learning — yet. The plan for that is below.

Plotting. SVG output for plot, hist, boxplot, barplot, pairs, qqplot, density.

REPL. Multi-line continuation, syntax-aware prompt, line-edit history, help operators.

Core statistics bit-identical to R 4.5.3. Roughly four out of five everyday R idioms most analysts write day-to-day run as written. The fifth is what the road from here closes.

How it stays fast without asking you to think about it

The engine has four cooperating layers — a representation of your code, a just-in-time compiler that turns hot paths into native machine code, a library of hand-tuned parallel primitives for the operations R does badly, and a hardware-aware dispatcher that picks the right path for the machine it is running on. Underneath sits a columnar memory layer that holds dense numeric data without copying on every operation, and a thread-local scratch arena that recycles the allocations that statistical workloads need most.

You do not need to know any of that to write R2 code. R2 looks like R. The complexity exists so the language can stay simple. The full architectural story belongs in its own writeup once the project is mature enough that the how matters as much as the what.

The performance story

R2 wins on compute-bound work. Fused reductions, math-heavy element-wise operations, anything where the cost is real arithmetic per element, anything that benefits from SIMD or from skipping intermediate allocations. The 11× fused sum(sqrt(x*x + 1)) is the headline; analogous wins land on sin(x)^2 + cos(x)^2, on Monte Carlo inner loops, on the repeated sapply patterns analysts write without thinking.

R2 draws on memory-bandwidth-bound work. Plain x + y on ten-million-element f64 vectors is limited by how fast your DRAM can deliver bytes. No JIT, no SIMD, no parallel dispatch reduces the byte traffic. R2 matches R there; it doesn't beat it. Pretending otherwise would be dishonest.

Every fused loop is a megajoule not spent. That's not the headline pitch, but it is the substrate.

Reproducibility and numerical fidelity

For academic and regulatory use, R-compatibility means more than "the right answer most of the time." Ardon-R2's core statistics are bit-identical to CRAN R 4.5.3: same t.test Welch degrees-of-freedom, same lm treatment contrasts, same summary() significance stars, same IEEE-754 NaN propagation through every operation. The integrity of the built-in datasets is verified on every build against canonical R column sums and row spot-checks.

AGPL-3.0 licensed — a deliberate choice that keeps the engine and its derivatives open. Citation-friendly. Reproducible across runs because the engine has no GC pauses and no thread scheduling jitter to introduce numerical drift.

On efficiency as substrate

Green AI is not a feature you bolt onto a runtime. It is what efficient compute looks like before the marketing layer goes on top.

Every fused loop is work the CPU does once instead of twice. Every skipped intermediate allocation is a cache line that doesn't get evicted. Every parallel dispatch that retires four cores' worth of work in one wall-clock second instead of four is three cores' idle time that wasn't billed to the grid. These compounds. Across a single analyst's day they're invisible. Across a fleet of inference servers running statistical validation in production, they show up on the power bill.

Ardon-R2 isn't pitched as a green-AI project. But a leaner runtime is what green AI looks like underneath, and the architecture leans that way on purpose.

What Ardon-R2 will do next

The point of shipping v0.1.1 is to earn the right to ship v1.0. The road from here is not a wishlist — it is the design that was already baked into the architecture before v0.1.1 went out, waiting for the work to land.

Deep learning, the pragmatic way. R2 will gain a tensor surface and a familiar Keras-style API by binding to candle, Hugging Face's pure-Rust machine-learning stack with CUDA, Metal, and CPU backends. This is the fastest credible path from "R2 has a basic neural net" to "R2 trains a transformer on your GPU." Later releases will pull primitives into the kernel layer as performance demands it, but the binding lands first — analysts get to use the deep-learning stack their colleagues are already using, in R syntax, without leaving R2.

Cross-vendor GPU dispatch. Through WebGPU compute kernels, the same R2 script will run on NVIDIA, AMD, Apple Silicon, or Intel Arc hardware. No CUDA lock-in. No vendor-specific toolchain to install. The hardware Oracle becomes device-aware: it sees the GPUs the machine has and dispatches there when it makes sense to. This is the path that puts statistical workloads on the same hardware ML training runs on, instead of next to it.

The Accelerator Hub. Beyond GPUs, custom compute is no longer exotic. TPUs, NPUs in modern laptops, dedicated ASICs in cloud instances — R2 will expose them through a single abstraction so the analyst writes R2 code and the runtime picks the right silicon. The user-facing language does not change. The compute layer becomes interchangeable. This is what lets R2 stay relevant on hardware nobody has shipped yet.

r2-calculus — the math base R never bundled. Numerical derivatives via Richardson extrapolation. Gauss-Kronrod adaptive quadrature for integrals where R's integrate() gives up. Higher moments, Jacobian and Hessian via forward-mode automatic differentiation. The mathematical machinery academic R users currently write by hand or import from three separate packages, shipped as standard library, with consistent numerics and citation-grade documentation.

r2-symbolic and r2-symreg — what nobody else is building. Symbolic differentiation. Algebraic simplification. And symbolic regression: deriving the functional form from a dataset, not fitting parameters to a form you guessed. Applications are real and significant — physics-informed machine learning that respects conservation laws, interpretable regulatory models where the equation is the deliverable, scientific discovery from sensor data where the relationship is unknown going in. No R package does this well. No mainstream language ships it. This is the direction I am most excited about, and it is what could distinguish Ardon-R2 from being a faster R into being a different kind of statistical tool.

Embeddable statistical computing. Ardon-R2 was designed from day one to be a linkable library, not just a REPL. A C-callable interface and a Python binding will let inference servers, ETL pipelines, and existing data-science stacks call statistical validation in-process instead of shelling out to a separate R container. Statistical computing rejoins the AI infrastructure stack instead of sitting beside it.

Deterministic by construction. No garbage collector pauses. No thread scheduling jitter. No nondeterministic floating-point fallbacks. The same input produces the same output to the last bit, across runs, across machines, across years. For clinical trials, finance, model audit, and any regulated workflow, this is not a nicety — it is what makes results defensible. R2's pure-Rust foundation makes determinism a property of the engine, not a configuration flag.

A package ecosystem that does not rot. R2 will grow its own package registry built on Cargo's model — frozen versions, audited dependencies, no shared-library archaeology. The plan is to make porting common CRAN packages systematic, so the long tail of R's ecosystem can move over without a rewrite per package. The goal is not to replace CRAN. The goal is that nobody who depends on Ardon-R2 ever loses their work the way the rattle generation lost theirs.

That is the road. v0.1.1 is the first step on it. Come help make it v1.0.

How to try it

The repository is at github.com/devendratandle/Ardon-R2. Clone it, run cargo build --release, and the binary lands at target/release/r2. There are benchmark scripts in bench/r_vs_r2/ you can run against your own R install. There's a comparison test harness that emits accuracy and performance reports side-by-side.

Issues, pull requests, and "this didn't work the way I expected" reports are all welcome. So are people who want to write the things v0.1.1 doesn't ship yet.

If your team carries an R workload into production and the rewrite tax is showing up on your roadmap, Ardon-R2 is worth a benchmark. If you are doing reproducible academic work and the version-hell of CRAN has cost you a paper, Ardon-R2 is worth a benchmark. If you find pure-Rust scientific computing interesting and want a non-trivial codebase to read, Ardon-R2 is worth reading.

Note on authorship

Ardon-R2 is an AI-assisted project. Architecture, code, tests, and prose are produced through pair-programming. I disclose it because I would want to know if I were reading it.

By Devendra Tandale
GitHub ·
LinkedIn ·
Open an issue

DEV Community: devendra tandle