Pavel

Posted on Jan 20

I Built a Deep Learning Framework in Rust from Scratch - Part 2: GPU Backend and the Lonely Debugging Journey

#rust #deeplearning #opensource #ai

In Part 1, I showed the basic architecture of RustyASG - my attempt to build a deep learning framework in Rust. Since then, a lot has happened. I want to share both the technical progress and the reality of working on such a complex project alone.

The Struggle Was Real

After publishing Part 1, I spent weeks battling errors. The thing is - nobody really helps with projects like this. The topic is too specialized. I asked questions on forums, searched through issues in similar projects... crickets.

Neural networks (the AI assistants) helped somewhat - not to solve specific bugs, but to organize my thoughts and break down the chaos into manageable pieces. When you're staring at 36 compilation errors and your brain is fried, sometimes you just need someone (or something) to help you see the structure.

The hardest part wasn't writing new features - it was fixing subtle type mismatches and reference issues in Rust's strict type system.

What Actually Got Done

GPU Backend with wgpu

The main achievement of this period - a working GPU backend. Not a toy, but real GPU acceleration using wgpu (WebGPU API that works on Vulkan, Metal, DX12).

Implemented operations:

Element-wise: Add, Sub, Mul, Div, Neg, Abs, Exp, Log, Sqrt
Activations: ReLU, Sigmoid, Tanh, GELU, SiLU, LeakyReLU, Softmax
Matrix operations: MatMul (including batched)
Reductions: Sum, Mean, Variance
Convolutions: Conv2d with stride and padding
Shape operations: Transpose, Reshape, Broadcast

Each operation has its own WGSL shader generated at runtime.

Proper Error Handling

In the first version, the GPU code was littered with .unwrap() - the Rust equivalent of "crash and pray." I replaced all of that with proper Result types:

// Before (bad)
let node = graph.nodes.get(&node_id).unwrap();

// After (good)  
let node = graph.get_node(node_id)?;

Sounds simple, but this change touched hundreds of lines and uncovered many hidden issues. The fun part: get_node() returns Result, not Option, so I had to use map_err instead of ok_or_else. Rust teaches you to read type signatures carefully.

149 Tests Pass

The test suite now includes:

77 unit tests for core functionality
38 integration tests for neural network layers
26 GPU tests that compare CPU and GPU results
8 autograd tests for backpropagation

Every GPU operation is tested by running the same computation on CPU and GPU, then comparing results with tolerance for floating-point differences.

RustyASG vs Competitors: Honest Assessment

Let me be straight about where RustyASG stands compared to other options.

Compared to PyTorch/TensorFlow

Don't even compare. These are battle-tested frameworks with thousands of contributors, CUDA optimization, and years of development. RustyASG is an educational/experimental project.

Compared to tch-rs (Rust bindings to LibTorch)

tch-rs wins for production use.

tch-rs: Full PyTorch functionality, CUDA support, maintained by the community
RustyASG: Pure Rust, no external dependencies, but limited operations

Compared to Burn

Burn is more mature.

Burn: Multiple backends, WebGPU support, active development, production focus
RustyASG: Simpler codebase, easier to understand and modify

Compared to Candle (by Hugging Face)

Candle is better for inference.

Candle: Optimized for inference, quantization support, GGUF models
RustyASG: Training-focused with autograd, but slower execution

Honest Pros and Cons

What RustyASG Can Actually Do

✅ Train simple neural networks (MLP, small CNNs)
✅ Run on GPU via WebGPU (cross-platform: Windows, Linux, Mac, even web)
✅ Automatic differentiation for backpropagation
✅ Load/save models in SafeTensors format
✅ PyTorch-style API for datasets and dataloaders
✅ Full transformer building blocks (MultiHeadAttention, positional encodings)

What RustyASG Cannot Do (Yet or Ever)

❌ Run large models efficiently (no memory optimization)
❌ Match CUDA performance (WebGPU has overhead)
❌ Support all PyTorch operations (maybe 30% coverage)
❌ Run in production (not battle-tested)
❌ Distributed training
❌ Mixed precision training

When to Consider RustyASG

Learning how deep learning frameworks work internally
Need pure Rust without external dependencies
Experimenting with custom operations
Want to contribute to a young project

The repository includes examples that actually work:

Linear Regression - Basic gradient descent
XOR Problem - Simple MLP with backpropagation
MNIST - Digit classification (if you supply the dataset)

// Simple training loop
for epoch in 0..epochs {
    for (batch_x, batch_y) in loader.iter() {
        // Forward pass
        let output = model.forward(&batch_x);
        let loss = mse_loss(&output, &batch_y);

        // Backward pass
        loss.backward();

        // Update weights
        optimizer.step();
        optimizer.zero_grad();
    }
}

Code Quality

Published on crates.io: cargo add rustyasg
Documentation with rustdoc
No unsafe code (pure safe Rust)
Clean module structure

Conclusion

RustyASG is not a PyTorch killer. It's not trying to be. It's an honest attempt to understand how deep learning frameworks work by building one from scratch in Rust.

If you want to learn about computational graphs, automatic differentiation, and GPU programming - the source code is open. If you want to train GPT-4 - use PyTorch.

Links:

GitHub: https://github.com/xzdes/RustyASG
crates.io: https://crates.io/crates/rustyasg
Part 1: I Built a Deep Learning Framework in Rust from Scratch - Part 1

Top comments (1)

Pavel • Jan 20

Quick update: The engine actually works!

Just trained a neural network with 15,684 parameters using RustyASG.
The model classifies sequence patterns (ascending, descending, alternating, constant) with 100% accuracy on both train and test sets.

What's working:
Automatic differentiation through complex graphs
Residual/skip connections (gradients flow correctly!)
Sinusoidal positional encodings Adam optimizer Forward pass Loss Backprop Parameter update - the full pipeline It's not PyTorch, but it's proof that a from-scratch Rust deep learning framework can actually learn something. Small victory, but a victory nonetheless.