DEV Community

Cover image for QORA - Native Rust LLM Inference Engine
Ravikash Gupta
Ravikash Gupta

Posted on

QORA - Native Rust LLM Inference Engine

Pure Rust inference engine for the SmolLM3-3B language model. No Python runtime, no CUDA, no external dependencies. Single executable + quantized weights = portable AI on any machine.

Downlod 🤗: https://huggingface.co/qoranet/QORA-LLM

| Base Model | SmolLM3-3B (HuggingFaceTB/SmolLM3-3B) | | Parameters | 3.07 Billion | | Quantization | Q4 (4-bit symmetric, group_size=32) | | Model Size | 1.68 GB (Q4) / ~6 GB (F16) | | Executable | 6.7 MB | | Context Length | 65,536 tokens (up to 128K with YARN) | | Platform | Windows x86_64 (CPU-only) |

Key Architectural Innovation: NoPE (No Position Encoding)
SmolLM3 uses a 3:1 NoPE ratio — 75% of layers have no positional encoding at all. Only layers 3, 7, 11, 15, 19, 23, 27, 31, 35 apply RoPE. This reduces computational overhead and enables better long-context generalization.

Performance Benchmarks
Test Hardware: Windows 11, CPU-only (no GPU acceleration)

Top comments (0)