Aximo — a local Rust STT API for CPU-only inference

#ai #api #rust #showdev

I built a local speech-to-text API in Rust that runs on CPU

I recently built Aximo, a self-hosted speech-to-text microservice designed to run locally on CPU, without depending on cloud APIs or external SaaS.

The idea was straightforward: I wanted an STT service that could be deployed like any other backend, stay fully local, and still be clean enough architecturally to evolve beyond a quick experiment.

Aximo is written in Rust, uses Parakeet v3 for local inference, exposes an HTTP API for transcription, and includes a WebSocket layer for realtime use cases. I also added Docker, OpenAPI, and a multi-crate workspace layout to keep the codebase modular from the start.

One detail I particularly liked: I extended Swagger UI so I can record audio directly from the microphone and send it to the API for testing. It’s a small feature, but it makes the developer experience much nicer when iterating on the service.

At this point, I’d call it a solid MVP rather than a production-ready system, but it already works well for local experimentation and as a foundation for a self-hosted STT stack.

One notable addition: I extended Swagger to support sending recordings directly from the microphone.