I built a sovereign voice layer that routes to 11 AI providers — here's the architecture

Snake River Ai — Wed, 29 Apr 2026 21:53:31 +0000

After two years of bouncing between Claude desktop, ChatGPT voice, Gemini, and a half-dozen Ollama frontends, I got tired of the wake-word thrash. Every assistant assumes you've picked their team forever.

So I built BRAGI — a voice layer that runs locally, listens locally, and routes to whichever AI I tell it to. Including the one running on the same machine.

This post is the architecture, not a sales pitch. If you've been thinking about building something similar, here's what I learned shipping v0.2.

The pipeline

Mic input
↓
openwakeword (local) — "Hey Jarvis"
↓
faster-whisper medium (local, GPU optional)
↓
Provider router (settings UI picks destination)
↓
[Cloud: Claude / OpenAI / Gemini / Grok / Groq / Together / HuggingFace]
[Local: Ollama / LM Studio / FREYA / Echo]
↓
TTS (eSpeak free, OpenAI Nova BYOK)
↓
Speaker output
Audio never leaves the machine. Only transcribed text goes to whichever cloud you picked, if any.

Wake word

openwakeword is the right call for a sovereign product. Picovoice is better quality but locks you into a paid commercial license. openwakeword is Apache 2.0 and runs on CPU.

The catch: training your own custom model requires matching the feature dimensions to whichever preprocessor version you're targeting. I burned half a day on a model that had 96×103 features when openwakeword expected 32×147. v0.2 ships with the stock "Hey Jarvis" model and includes the custom "Hey BRAGI" model for users with compatible hardware.

STT

faster-whisper medium on CUDA is the sweet spot. Tiny is too inaccurate for real conversation, large is overkill for short voice commands. Medium gets ~1 second latency on a midrange GPU and handles bilingual input out of the box.

Critical detail: instantiate Whisper once at startup, never per-request. First inference call takes 5-10 seconds to warm CUDA. Users won't tolerate that on every wake.

The router

This was the hardest part. Each provider has a different SDK, different streaming format, different auth pattern. The router abstracts that into one interface:

class Provider(Protocol):
    def name(self) -> str: ...
    def is_ready(self) -> bool: ...
    async def respond(self, prompt: str, history: list[Message]) -> AsyncIterator[str]: ...

Each provider implementation handles its own SDK quirks. The router just picks one based on user settings or voice command ("BRAGI, switch to Claude") and calls respond().

For local models I support both Ollama (HTTP API) and LM Studio (OpenAI-compatible HTTP API). Both run on the user's machine. Both look identical to the router.

TTS

eSpeak ships with the installer because it's free, offline, and 100+ languages. It sounds robotic. That's fine. People who want premium voice can paste an OpenAI API key and use Nova.

I tried Kokoro for higher-quality offline TTS. Worked great in dev. Production builds kept hitting a 404 on the default voice file in HuggingFace. Shipped with eSpeak as the default and Kokoro as best-effort.

The settings UI

Local web UI on http://127.0.0.1:7777. Configure providers, paste API keys, pick voices, manage license. Page lives on the user's machine. No account, no login, no cloud dashboard.

API keys live in a local vault. They never leave the machine. The product is sovereignty — that has to be true at every layer.

Stack

Python 3.11
openwakeword for wake detection
faster-whisper for STT
eSpeak / OpenAI Nova for TTS
FastAPI for the local settings server
pythonw.exe in tray mode for daily use
PyInstaller for bundling
NSIS for the Windows installer
~169MB installer, Win10/11

What I'd do differently

Custom wake word training is harder than the docs admit. openwakeword's preprocessor is versioned and the feature dims have to match exactly. Document this for users who want to train their own.
PyInstaller + 4GB CUDA torch builds blow past NSIS's 2GB single-file limit. I had to move torch + Kokoro to a first-run download instead of bundling them.
Don't trust the embedded Python's python311._pth defaults. User-site contamination from %APPDATA%\Roaming\Python will silently break your install. Always launch with -s -E flags.

What's next

v0.3 will likely add: better Kokoro fallback, custom wake word training UI, multi-room concurrency. The architecture supports it — I just need to ship v0.2 first and see what users actually ask for.

If you want to see it: clintwave84.gumroad.com/l/leetkd

If you've built something similar and want to compare notes — drop a comment. Especially curious how others have handled the provider abstraction across cloud + local.

— Built by one guy in Idaho. Snake River AI.

We built an AI smart contract auditor for $199 — here's how

Snake River Ai — Wed, 08 Apr 2026 05:55:02 +0000

Smart contract security is a billion-dollar problem. Hacks, exploits, and rug pulls cost the Web3 ecosystem hundreds of millions every year — and most of them stem from bugs that a careful audit would have caught. The problem? Professional audits from top firms can run $20,000 to $100,000+, putting them out of reach for indie developers and small teams.

We decided to change that. Based out of Boise, Idaho, our team at Snake River AI built a fully automated smart contract auditor that runs for a flat $199 per audit. Here's how we did it — and what we learned along the way.

Why Idaho?

When people think of AI infrastructure, they picture Silicon Valley server farms or AWS data centers in Virginia. We took a different path. Idaho's energy costs are among the lowest in the country, and the state's investment in renewable power (hydro and wind) made it an attractive location for running GPU workloads sustainably. We stood up our own local inference cluster in the Treasure Valley — keeping data on-premises, latency low, and costs predictable.

Running local AI infrastructure meant we weren't paying per-token API fees to a cloud provider. That's the key to making $199 audits economically viable. Our stack uses open-weight models fine-tuned on a corpus of known Solidity vulnerabilities, EVM bytecode patterns, and audit reports from past exploits.

What the auditor actually does

When a developer submits a contract, our pipeline:

Parses the Solidity source and builds an abstract syntax tree (AST)
Runs static analysis to flag common issues: reentrancy, integer overflow, unchecked external calls, improper access control
Passes the AST and source to our locally-hosted LLM, which reasons about logic-level vulnerabilities that static tools miss
Cross-references findings against a database of known CVEs and DeFi exploit patterns
Generates a structured report with severity ratings (Critical / High / Medium / Low / Informational) and plain-English remediation advice

The whole pipeline runs in under 90 seconds for most contracts.

The stack

Models: Fine-tuned Mistral and CodeLlama variants, served via vLLM on our Idaho GPU cluster
Static analysis: Slither + custom Semgrep rules
Backend: FastAPI (Python), PostgreSQL, Redis for job queuing
Frontend: Next.js with a clean, developer-focused UI
Infrastructure: Bare-metal servers in Idaho, managed with Ansible

Results so far

In our beta, we've processed over 300 contracts across ERC-20 tokens, NFT minting contracts, and DeFi vaults. Our model correctly flagged 91% of the known vulnerabilities we seeded into test contracts, and surfaced several real issues in production codebases that developers hadn't caught.

One beta user — a small DeFi team — found a critical reentrancy vulnerability in their staking contract before launch. That $199 audit potentially saved their users from a six-figure exploit.

Try it yourself

The auditor is live at audit.snakeriverai.com. Paste in your contract address or upload your Solidity source, and you'll have a full report in minutes.

We're actively improving the model, expanding support for Vyper contracts, and building out integrations with GitHub Actions so audits can run automatically in CI/CD pipelines.

Security shouldn't be a luxury. If you're shipping smart contracts, give it a try — and let us know what you think in the comments.

DEV Community: Snake River Ai