<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Snake River Ai</title>
    <description>The latest articles on DEV Community by Snake River Ai (@cryforyou22).</description>
    <link>https://dev.to/cryforyou22</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3867005%2Fea79b068-34b5-45fa-b3fb-3c79a8378903.jpg</url>
      <title>DEV Community: Snake River Ai</title>
      <link>https://dev.to/cryforyou22</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cryforyou22"/>
    <language>en</language>
    <item>
      <title>I built a sovereign voice layer that routes to 11 AI providers — here's the architecture</title>
      <dc:creator>Snake River Ai</dc:creator>
      <pubDate>Wed, 29 Apr 2026 21:53:31 +0000</pubDate>
      <link>https://dev.to/cryforyou22/i-built-a-sovereign-voice-layer-that-routes-to-11-ai-providers-heres-the-architecture-1b52</link>
      <guid>https://dev.to/cryforyou22/i-built-a-sovereign-voice-layer-that-routes-to-11-ai-providers-heres-the-architecture-1b52</guid>
      <description>&lt;p&gt;After two years of bouncing between Claude desktop, ChatGPT voice, Gemini, and a half-dozen Ollama frontends, I got tired of the wake-word thrash. Every assistant assumes you've picked their team forever.&lt;/p&gt;

&lt;p&gt;So I built BRAGI — a voice layer that runs locally, listens locally, and routes to whichever AI I tell it to. Including the one running on the same machine.&lt;/p&gt;

&lt;p&gt;This post is the architecture, not a sales pitch. If you've been thinking about building something similar, here's what I learned shipping v0.2.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pipeline
&lt;/h2&gt;

&lt;p&gt;Mic input&lt;br&gt;
↓&lt;br&gt;
openwakeword (local) — "Hey Jarvis"&lt;br&gt;
↓&lt;br&gt;
faster-whisper medium (local, GPU optional)&lt;br&gt;
↓&lt;br&gt;
Provider router (settings UI picks destination)&lt;br&gt;
↓&lt;br&gt;
[Cloud: Claude / OpenAI / Gemini / Grok / Groq / Together / HuggingFace]&lt;br&gt;
[Local: Ollama / LM Studio / FREYA / Echo]&lt;br&gt;
↓&lt;br&gt;
TTS (eSpeak free, OpenAI Nova BYOK)&lt;br&gt;
↓&lt;br&gt;
Speaker output&lt;br&gt;
Audio never leaves the machine. Only transcribed text goes to whichever cloud you picked, if any.&lt;/p&gt;
&lt;h2&gt;
  
  
  Wake word
&lt;/h2&gt;

&lt;p&gt;openwakeword is the right call for a sovereign product. Picovoice is better quality but locks you into a paid commercial license. openwakeword is Apache 2.0 and runs on CPU.&lt;/p&gt;

&lt;p&gt;The catch: training your own custom model requires matching the feature dimensions to whichever preprocessor version you're targeting. I burned half a day on a model that had 96×103 features when openwakeword expected 32×147. v0.2 ships with the stock "Hey Jarvis" model and includes the custom "Hey BRAGI" model for users with compatible hardware.&lt;/p&gt;
&lt;h2&gt;
  
  
  STT
&lt;/h2&gt;

&lt;p&gt;faster-whisper medium on CUDA is the sweet spot. Tiny is too inaccurate for real conversation, large is overkill for short voice commands. Medium gets ~1 second latency on a midrange GPU and handles bilingual input out of the box.&lt;/p&gt;

&lt;p&gt;Critical detail: instantiate Whisper once at startup, never per-request. First inference call takes 5-10 seconds to warm CUDA. Users won't tolerate that on every wake.&lt;/p&gt;
&lt;h2&gt;
  
  
  The router
&lt;/h2&gt;

&lt;p&gt;This was the hardest part. Each provider has a different SDK, different streaming format, different auth pattern. The router abstracts that into one interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Provider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Protocol&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_ready&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AsyncIterator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each provider implementation handles its own SDK quirks. The router just picks one based on user settings or voice command ("BRAGI, switch to Claude") and calls &lt;code&gt;respond()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For local models I support both Ollama (HTTP API) and LM Studio (OpenAI-compatible HTTP API). Both run on the user's machine. Both look identical to the router.&lt;/p&gt;

&lt;h2&gt;
  
  
  TTS
&lt;/h2&gt;

&lt;p&gt;eSpeak ships with the installer because it's free, offline, and 100+ languages. It sounds robotic. That's fine. People who want premium voice can paste an OpenAI API key and use Nova.&lt;/p&gt;

&lt;p&gt;I tried Kokoro for higher-quality offline TTS. Worked great in dev. Production builds kept hitting a 404 on the default voice file in HuggingFace. Shipped with eSpeak as the default and Kokoro as best-effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  The settings UI
&lt;/h2&gt;

&lt;p&gt;Local web UI on &lt;a href="http://127.0.0.1:7777" rel="noopener noreferrer"&gt;http://127.0.0.1:7777&lt;/a&gt;. Configure providers, paste API keys, pick voices, manage license. Page lives on the user's machine. No account, no login, no cloud dashboard.&lt;/p&gt;

&lt;p&gt;API keys live in a local vault. They never leave the machine. The product is sovereignty — that has to be true at every layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.11&lt;/li&gt;
&lt;li&gt;openwakeword for wake detection&lt;/li&gt;
&lt;li&gt;faster-whisper for STT&lt;/li&gt;
&lt;li&gt;eSpeak / OpenAI Nova for TTS&lt;/li&gt;
&lt;li&gt;FastAPI for the local settings server&lt;/li&gt;
&lt;li&gt;pythonw.exe in tray mode for daily use&lt;/li&gt;
&lt;li&gt;PyInstaller for bundling&lt;/li&gt;
&lt;li&gt;NSIS for the Windows installer&lt;/li&gt;
&lt;li&gt;~169MB installer, Win10/11&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Custom wake word training is harder than the docs admit.&lt;/strong&gt; openwakeword's preprocessor is versioned and the feature dims have to match exactly. Document this for users who want to train their own.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PyInstaller + 4GB CUDA torch builds blow past NSIS's 2GB single-file limit.&lt;/strong&gt; I had to move torch + Kokoro to a first-run download instead of bundling them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Don't trust the embedded Python's &lt;code&gt;python311._pth&lt;/code&gt; defaults.&lt;/strong&gt; User-site contamination from &lt;code&gt;%APPDATA%\Roaming\Python&lt;/code&gt; will silently break your install. Always launch with &lt;code&gt;-s -E&lt;/code&gt; flags.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;v0.3 will likely add: better Kokoro fallback, custom wake word training UI, multi-room concurrency. The architecture supports it — I just need to ship v0.2 first and see what users actually ask for.&lt;/p&gt;

&lt;p&gt;If you want to see it: clintwave84.gumroad.com/l/leetkd&lt;/p&gt;

&lt;p&gt;If you've built something similar and want to compare notes — drop a comment. Especially curious how others have handled the provider abstraction across cloud + local.&lt;/p&gt;

&lt;p&gt;— Built by one guy in Idaho. Snake River AI.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>voice</category>
      <category>opensource</category>
    </item>
    <item>
      <title>We built an AI smart contract auditor for $199 — here's how</title>
      <dc:creator>Snake River Ai</dc:creator>
      <pubDate>Wed, 08 Apr 2026 05:55:02 +0000</pubDate>
      <link>https://dev.to/cryforyou22/we-built-an-ai-smart-contract-auditor-for-199-heres-how-41de</link>
      <guid>https://dev.to/cryforyou22/we-built-an-ai-smart-contract-auditor-for-199-heres-how-41de</guid>
      <description>&lt;p&gt;Smart contract security is a billion-dollar problem. Hacks, exploits, and rug pulls cost the Web3 ecosystem hundreds of millions every year — and most of them stem from bugs that a careful audit would have caught. The problem? Professional audits from top firms can run $20,000 to $100,000+, putting them out of reach for indie developers and small teams.&lt;/p&gt;

&lt;p&gt;We decided to change that. Based out of Boise, Idaho, our team at Snake River AI built a fully automated smart contract auditor that runs for a flat $199 per audit. Here's how we did it — and what we learned along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Idaho?
&lt;/h2&gt;

&lt;p&gt;When people think of AI infrastructure, they picture Silicon Valley server farms or AWS data centers in Virginia. We took a different path. Idaho's energy costs are among the lowest in the country, and the state's investment in renewable power (hydro and wind) made it an attractive location for running GPU workloads sustainably. We stood up our own local inference cluster in the Treasure Valley — keeping data on-premises, latency low, and costs predictable.&lt;/p&gt;

&lt;p&gt;Running local AI infrastructure meant we weren't paying per-token API fees to a cloud provider. That's the key to making $199 audits economically viable. Our stack uses open-weight models fine-tuned on a corpus of known Solidity vulnerabilities, EVM bytecode patterns, and audit reports from past exploits.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the auditor actually does
&lt;/h2&gt;

&lt;p&gt;When a developer submits a contract, our pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Parses the Solidity source&lt;/strong&gt; and builds an abstract syntax tree (AST)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runs static analysis&lt;/strong&gt; to flag common issues: reentrancy, integer overflow, unchecked external calls, improper access control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Passes the AST and source&lt;/strong&gt; to our locally-hosted LLM, which reasons about logic-level vulnerabilities that static tools miss&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-references&lt;/strong&gt; findings against a database of known CVEs and DeFi exploit patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generates a structured report&lt;/strong&gt; with severity ratings (Critical / High / Medium / Low / Informational) and plain-English remediation advice&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The whole pipeline runs in under 90 seconds for most contracts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: Fine-tuned Mistral and CodeLlama variants, served via vLLM on our Idaho GPU cluster&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static analysis&lt;/strong&gt;: Slither + custom Semgrep rules&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend&lt;/strong&gt;: FastAPI (Python), PostgreSQL, Redis for job queuing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: Next.js with a clean, developer-focused UI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure&lt;/strong&gt;: Bare-metal servers in Idaho, managed with Ansible&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Results so far
&lt;/h2&gt;

&lt;p&gt;In our beta, we've processed over 300 contracts across ERC-20 tokens, NFT minting contracts, and DeFi vaults. Our model correctly flagged 91% of the known vulnerabilities we seeded into test contracts, and surfaced several real issues in production codebases that developers hadn't caught.&lt;/p&gt;

&lt;p&gt;One beta user — a small DeFi team — found a critical reentrancy vulnerability in their staking contract before launch. That $199 audit potentially saved their users from a six-figure exploit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;The auditor is live at &lt;strong&gt;&lt;a href="https://audit.snakeriverai.com" rel="noopener noreferrer"&gt;audit.snakeriverai.com&lt;/a&gt;&lt;/strong&gt;. Paste in your contract address or upload your Solidity source, and you'll have a full report in minutes.&lt;/p&gt;

&lt;p&gt;We're actively improving the model, expanding support for Vyper contracts, and building out integrations with GitHub Actions so audits can run automatically in CI/CD pipelines.&lt;/p&gt;

&lt;p&gt;Security shouldn't be a luxury. If you're shipping smart contracts, give it a try — and let us know what you think in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>blockchain</category>
      <category>security</category>
      <category>web3</category>
    </item>
  </channel>
</rss>
