I built a local speech-to-text API in Rust that runs on CPU
I recently built Aximo, a self-hosted speech-to-text microservice designed to run locally on CPU, without depending on cloud APIs or external SaaS.
The idea was straightforward: I wanted an STT service that could be deployed like any other backend, stay fully local, and still be clean enough architecturally to evolve beyond a quick experiment.
Aximo is written in Rust, uses Parakeet v3 for local inference, exposes an HTTP API for transcription, and includes a WebSocket layer for realtime use cases. I also added Docker, OpenAPI, and a multi-crate workspace layout to keep the codebase modular from the start.
One detail I particularly liked: I extended Swagger UI so I can record audio directly from the microphone and send it to the API for testing. It’s a small feature, but it makes the developer experience much nicer when iterating on the service.
At this point, I’d call it a solid MVP rather than a production-ready system, but it already works well for local experimentation and as a foundation for a self-hosted STT stack.
One notable addition: I extended Swagger to support sending recordings directly from the microphone.
Repo: github.com/aximo

Top comments (0)