Most face swap tools are Python scripts stitched together with PyTorch, OpenCV, and a prayer. They work, but they drag in gigabytes of dependencies, need CUDA configured just right, and fall apart the moment you try to run them in real time.
I wanted to see if the entire pipeline could run in pure Rust. No Python. No PyTorch. No wrappers. One binary that you download, unpack, and run.
Turns out it can. 60fps on a webcam feed.
The Pipeline
Four neural networks run sequentially on every frame:
RetinaFace detects faces and extracts five landmark points. ArcFace computes a 512-dimensional embedding from the source face. InSwapper takes the target face region and the source embedding, produces a swapped face. GFPGAN optionally enhances the result for higher quality output.
All four models run through ONNX Runtime. No custom CUDA kernels, no framework overhead. Just raw tensor in, tensor out.
Architecture
Three threads, no locks on the hot path:
The capture thread grabs frames from the webcam via nokhwa and publishes them through an ArcSwap. The pipeline thread picks up new frames, runs inference, and publishes processed frames through a second ArcSwap. The UI thread reads whichever buffer is current and renders through egui.
No mutexes on frame data. No channels. No async. Just atomic generation counters and lock-free pointer swaps. The shared state structs are 64 bytes each, aligned to cache lines to prevent false sharing between cores.
Zero Allocation Hot Path
Every pixel buffer in the pipeline is pre-allocated at startup. The RGBA to RGB conversion, the tensor fill, the affine warp, the paste-back blending, none of them allocate during processing. The only heap allocation per frame is the Arc wrapping the final snapshot, which is unavoidable with the ArcSwap pattern.
What I Learned
Rust is genuinely excellent for this kind of work. The ownership model made the multithreaded architecture trivial to get right. No data races, no use-after-free, no mystery crashes at 3am. The borrow checker complained exactly once during development, and it was correct.
ONNX Runtime through the ort crate is production ready. Model loading, tensor creation, inference, all straightforward. The only rough edge is the session builder API requiring mutable references in surprising places.
egui is underrated for real-time applications. Immediate mode rendering with zero retained state makes it perfect for a live video feed. The texture upload path is clean and fast enough for 60fps without vsync tricks.
Try It
The release archive includes the binary and all models. Download, unpack, run. Nothing else needed.
GitHub: github.com/despite-death/face-swap
Feedback on the architecture and code is welcome. Especially interested in whether anyone has experience with ONNX Runtime multisession threading, running multiple models in parallel instead of sequentially could push this well past 60fps.
For further actions, you may consider blocking this person and/or reporting abuse

Top comments (0)