In Part 1, I showed the basic architecture of RustyASG - my attempt to build a deep learning framework in Rust. Since then, a lot has happened. I want to share both the technical progress and the reality of working on such a complex project alone.
The Struggle Was Real
After publishing Part 1, I spent weeks battling errors. The thing is - nobody really helps with projects like this. The topic is too specialized. I asked questions on forums, searched through issues in similar projects... crickets.
Neural networks (the AI assistants) helped somewhat - not to solve specific bugs, but to organize my thoughts and break down the chaos into manageable pieces. When you're staring at 36 compilation errors and your brain is fried, sometimes you just need someone (or something) to help you see the structure.
The hardest part wasn't writing new features - it was fixing subtle type mismatches and reference issues in Rust's strict type system.
What Actually Got Done
GPU Backend with wgpu
The main achievement of this period - a working GPU backend. Not a toy, but real GPU acceleration using wgpu (WebGPU API that works on Vulkan, Metal, DX12).
Implemented operations:
- Element-wise: Add, Sub, Mul, Div, Neg, Abs, Exp, Log, Sqrt
- Activations: ReLU, Sigmoid, Tanh, GELU, SiLU, LeakyReLU, Softmax
- Matrix operations: MatMul (including batched)
- Reductions: Sum, Mean, Variance
- Convolutions: Conv2d with stride and padding
- Shape operations: Transpose, Reshape, Broadcast
Each operation has its own WGSL shader generated at runtime.
Proper Error Handling
In the first version, the GPU code was littered with .unwrap() - the Rust equivalent of "crash and pray." I replaced all of that with proper Result types:
// Before (bad)
let node = graph.nodes.get(&node_id).unwrap();
// After (good)
let node = graph.get_node(node_id)?;
Sounds simple, but this change touched hundreds of lines and uncovered many hidden issues. The fun part: get_node() returns Result, not Option, so I had to use map_err instead of ok_or_else. Rust teaches you to read type signatures carefully.
149 Tests Pass
The test suite now includes:
- 77 unit tests for core functionality
- 38 integration tests for neural network layers
- 26 GPU tests that compare CPU and GPU results
- 8 autograd tests for backpropagation
Every GPU operation is tested by running the same computation on CPU and GPU, then comparing results with tolerance for floating-point differences.
RustyASG vs Competitors: Honest Assessment
Let me be straight about where RustyASG stands compared to other options.
Compared to PyTorch/TensorFlow
Don't even compare. These are battle-tested frameworks with thousands of contributors, CUDA optimization, and years of development. RustyASG is an educational/experimental project.
Compared to tch-rs (Rust bindings to LibTorch)
tch-rs wins for production use.
- tch-rs: Full PyTorch functionality, CUDA support, maintained by the community
- RustyASG: Pure Rust, no external dependencies, but limited operations
Compared to Burn
Burn is more mature.
- Burn: Multiple backends, WebGPU support, active development, production focus
- RustyASG: Simpler codebase, easier to understand and modify
Compared to Candle (by Hugging Face)
Candle is better for inference.
- Candle: Optimized for inference, quantization support, GGUF models
- RustyASG: Training-focused with autograd, but slower execution
Honest Pros and Cons
What RustyASG Can Actually Do
✅ Train simple neural networks (MLP, small CNNs)
✅ Run on GPU via WebGPU (cross-platform: Windows, Linux, Mac, even web)
✅ Automatic differentiation for backpropagation
✅ Load/save models in SafeTensors format
✅ PyTorch-style API for datasets and dataloaders
✅ Full transformer building blocks (MultiHeadAttention, positional encodings)
What RustyASG Cannot Do (Yet or Ever)
❌ Run large models efficiently (no memory optimization)
❌ Match CUDA performance (WebGPU has overhead)
❌ Support all PyTorch operations (maybe 30% coverage)
❌ Run in production (not battle-tested)
❌ Distributed training
❌ Mixed precision training
When to Consider RustyASG
- Learning how deep learning frameworks work internally
- Need pure Rust without external dependencies
- Experimenting with custom operations
- Want to contribute to a young project
The repository includes examples that actually work:
- Linear Regression - Basic gradient descent
- XOR Problem - Simple MLP with backpropagation
- MNIST - Digit classification (if you supply the dataset)
// Simple training loop
for epoch in 0..epochs {
for (batch_x, batch_y) in loader.iter() {
// Forward pass
let output = model.forward(&batch_x);
let loss = mse_loss(&output, &batch_y);
// Backward pass
loss.backward();
// Update weights
optimizer.step();
optimizer.zero_grad();
}
}
Code Quality
- Published on crates.io:
cargo add rustyasg - Documentation with rustdoc
- No unsafe code (pure safe Rust)
- Clean module structure
Conclusion
RustyASG is not a PyTorch killer. It's not trying to be. It's an honest attempt to understand how deep learning frameworks work by building one from scratch in Rust.
If you want to learn about computational graphs, automatic differentiation, and GPU programming - the source code is open. If you want to train GPT-4 - use PyTorch.
Links:
Top comments (1)
Quick update: The engine actually works!
Just trained a neural network with 15,684 parameters using RustyASG.
The model classifies sequence patterns (ascending, descending, alternating, constant) with 100% accuracy on both train and test sets.
What's working:
Automatic differentiation through complex graphs
Residual/skip connections (gradients flow correctly!)
Sinusoidal positional encodings Adam optimizer Forward pass Loss Backprop Parameter update - the full pipeline It's not PyTorch, but it's proof that a from-scratch Rust deep learning framework can actually learn something. Small victory, but a victory nonetheless.