<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: BrethofAI</title>
    <description>The latest articles on DEV Community by BrethofAI (@brethofai).</description>
    <link>https://dev.to/brethofai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3878857%2Fd0b3f37e-54d4-4266-be08-06d204e8bb1a.jpg</url>
      <title>DEV Community: BrethofAI</title>
      <link>https://dev.to/brethofai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/brethofai"/>
    <language>en</language>
    <item>
      <title>The local voice stack that beats the cloud at its own benchmarks</title>
      <dc:creator>BrethofAI</dc:creator>
      <pubDate>Mon, 25 May 2026 20:57:05 +0000</pubDate>
      <link>https://dev.to/brethofai/the-local-voice-stack-that-beats-the-cloud-at-its-own-benchmarks-449d</link>
      <guid>https://dev.to/brethofai/the-local-voice-stack-that-beats-the-cloud-at-its-own-benchmarks-449d</guid>
      <description>&lt;p&gt;&lt;strong&gt;Brethof Voice Pro 2.0 — offline voice-to-text and 38-language translation, 100% on your machine.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every major dictation tool — Dragon, Otter, Google, Apple, the cloud transcription service of the week — captures your voice on your machine, streams it to a data centre, transcribes it there, and sends text back. Sometimes the audio is stored. Sometimes it trains a model. Sometimes it's 'anonymised', a word that stopped meaning much years ago.&lt;/p&gt;

&lt;p&gt;Watch what people actually dictate and you see why that matters: medical notes, legal drafts, interviews with named sources, therapy summaries, deal memos, personal journals. The most sensitive text a person produces — uploaded by default, often against HIPAA, GDPR, or plain decency, because there was no alternative.&lt;/p&gt;

&lt;p&gt;Brethof Voice Pro is the alternative, and 2.0 is the release where 'local' stops being a compromise: it transcribes, &lt;strong&gt;translates&lt;/strong&gt;, dictates into any app, and trains on your own voice — all on your hardware, with no cloud mode to forget to switch off.&lt;/p&gt;

&lt;h2&gt;
  
  
  The engine: GGUF + llama.cpp, 5–7× faster than Whisper
&lt;/h2&gt;

&lt;p&gt;Voice Pro runs Qwen3-ASR on &lt;a href="https://github.com/ggml-org/llama.cpp" rel="noopener noreferrer"&gt;llama.cpp&lt;/a&gt; with GGUF-quantised models. What that buys you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;5–7× faster transcription than Whisper&lt;/strong&gt;, with a &lt;strong&gt;~400 ms cold start&lt;/strong&gt; — weights are memory-mapped, so the first hotkey press after a reboot is already listening.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An 83 MB install&lt;/strong&gt; on Windows (161 MB on Linux) — one binary that runs on &lt;strong&gt;CPU, NVIDIA, AMD, and Intel GPUs via Vulkan&lt;/strong&gt;. No CUDA-only lock-in, no runtime wheels to match to your hardware.&lt;/li&gt;
&lt;li&gt;A genuinely &lt;strong&gt;state-of-the-art&lt;/strong&gt; base model. Qwen3-ASR posts &lt;strong&gt;1.84% average word error rate&lt;/strong&gt; across a 10-language test and &lt;strong&gt;4.5% on English&lt;/strong&gt; — where OpenAI's Whisper Large-v3 sits at &lt;strong&gt;7.4%&lt;/strong&gt;. Its language identification is &lt;strong&gt;97.9%&lt;/strong&gt; accurate across 30 languages, vs Whisper Large-v3's 94.1%.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Smaller, faster, and more accurate than the model everyone benchmarks against — running entirely on your box.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's new in 2.0: offline translation
&lt;/h2&gt;

&lt;p&gt;The headline feature is &lt;strong&gt;translation that never leaves your machine&lt;/strong&gt;, across &lt;strong&gt;38 languages&lt;/strong&gt;, powered by Tencent's &lt;strong&gt;Hunyuan-MT2&lt;/strong&gt; (open-sourced May 2026). It earns the billing: the Hunyuan-MT line took &lt;strong&gt;first place in 30 of 31 categories at WMT25&lt;/strong&gt;, and MT2 is a step beyond it — its translation quality is comparable to Google Gemini 3.1 Pro on the FLORES-200 benchmark (XCOMET-XXL), in a model small enough to run on your own GPU.&lt;/p&gt;

&lt;p&gt;We benchmarked both tiers ourselves — COMET-22, higher is better, across EN↔Polish, EN→Chinese, German, and Arabic:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Size on disk&lt;/th&gt;
&lt;th&gt;COMET-22&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fast (1.8B)&lt;/td&gt;
&lt;td&gt;~1 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;87.6&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quality (7B)&lt;/td&gt;
&lt;td&gt;~4.3 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;89.0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both run locally — sub-second on a GPU, and the Fast tier is sub-second even on CPU. Because the engine gives us per-engine device control, you can run &lt;strong&gt;ASR on one GPU and translation on another&lt;/strong&gt;, or pin the 7B model to CPU on a VRAM-tight laptop.&lt;/p&gt;

&lt;p&gt;Translation shows up everywhere transcription does:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Transcribe popup&lt;/strong&gt; — a 'Translate to' dropdown on file, mic, and system-audio capture.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice keyboard&lt;/strong&gt; — pick one or several targets; it types the &lt;em&gt;translation&lt;/em&gt; (one per line, inline, or primary-only).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subtitle translator&lt;/strong&gt; — translate every cue of an SRT/VTT, keep the timings, optional &lt;strong&gt;bilingual&lt;/strong&gt; mode (source line with the translation beneath).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The core, end to end
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Transcription&lt;/strong&gt; takes three inputs in one popup: an &lt;strong&gt;audio or video file&lt;/strong&gt; (drag-and-drop; it pulls the track out of mp4/mkv/mov/webm and a dozen more formats), the &lt;strong&gt;microphone&lt;/strong&gt;, or &lt;strong&gt;system audio&lt;/strong&gt; — whatever is playing on your speakers, so you can capture a meeting, a browser tab, or a video. Output is plain text or &lt;strong&gt;SRT&lt;/strong&gt; with timestamps; add the optional Forced Aligner for &lt;strong&gt;word-level&lt;/strong&gt; timestamps.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Good for: transcribing interviews, turning a recorded talk into subtitles, capturing a call you're in without a bot joining the room.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The voice keyboard&lt;/strong&gt; is push-to-talk dictation into &lt;em&gt;any&lt;/em&gt; focused app. Default &lt;strong&gt;F9&lt;/strong&gt;, hold-to-talk or toggle, optional right-mouse trigger; it injects text at the OS level — editor, browser, terminal, chat box. Turn on live translation and you speak English while it types Polish.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Good for: dictating commit messages into your IDE, replying in a language you read better than you write, drafting hands-free.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Hotwords&lt;/strong&gt; do two jobs from one field: they bias ASR toward your brand names and jargon (so 'VFIO' stops becoming 'VEAF1'), and they pin terminology for the translator. &lt;strong&gt;Noise reduction&lt;/strong&gt; (DeepFilter) is included but &lt;strong&gt;off by default&lt;/strong&gt; — it hurts quality on short clean clips, so it's there for noisy rooms when you need it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Train it on your own voice — and beat the big model
&lt;/h2&gt;

&lt;p&gt;This is the part the cloud can't do. Every time you correct a misheard word, the audio-and-correction pair is saved to a local dataset, and the main window shows your running sample count. One click runs a &lt;strong&gt;LoRA fine-tune&lt;/strong&gt; (it auto-selects an NVIDIA CUDA backend if you have one, CPU otherwise), then merges and exports the result to GGUF — and you switch to your personal model &lt;strong&gt;right from the main screen&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Does it actually work? We fine-tuned the small &lt;strong&gt;0.6B&lt;/strong&gt; model on about 11 hours of Polish. It scored &lt;strong&gt;6.10% WER — beating Whisper Large-v3's 8.40%&lt;/strong&gt; on the same audio. A model a fraction of the size, adapted on-device to one language and voice, out-performing the big general model. Nothing left the machine to get there.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Good for: strong accents, field vocabulary (medical, legal, engineering), or simply grinding your error rate down over a few weeks of normal use.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  For developers: the MCP server
&lt;/h2&gt;

&lt;p&gt;Voice Pro ships as a &lt;strong&gt;Model Context Protocol server&lt;/strong&gt; — &lt;strong&gt;19 tools&lt;/strong&gt; exposing ASR and translation to any MCP agent: Claude Desktop, Claude Code, Cursor, Cline, OpenClaw, Hermes. Same binary, just &lt;code&gt;--mcp&lt;/code&gt;; transport is stdio, so there's no port, no localhost binding, no firewall prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"brethof-voice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"brethof-voice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"--mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now your agent can transcribe files, record and transcribe the mic, translate text and SRTs, switch compute devices, and manage voice profiles — locally, with no API keys and no per-minute billing. 'Transcribe this interview and give me a German SRT' becomes a fully offline operation.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Good for: agent pipelines that process audio without shipping it to a third party, batch subtitle jobs, and voice-driven tooling you actually control.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Languages, stated honestly
&lt;/h2&gt;

&lt;p&gt;No rounded-up number:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Transcription: 30 selectable languages + 22 Chinese dialects&lt;/strong&gt; the model recognises automatically (52 languages and dialects in total), plus auto-detect.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Translation: 38 languages&lt;/strong&gt; via Hunyuan-MT2.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;23 languages work in both directions&lt;/strong&gt; — speak it, see it written, then see it in any of the others.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They don't perfectly overlap (ASR handles Danish, Greek, Finnish, and Swedish that translation doesn't; translation handles Hindi, Bengali, Tamil, and Ukrainian that ASR doesn't surface), so the &lt;a href="https://brethof.ai/voice/tour/" rel="noopener noreferrer"&gt;feature tour&lt;/a&gt; publishes the full per-language table with a tick in each column. No asterisks.&lt;/p&gt;

&lt;h2&gt;
  
  
  The privacy guarantee
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No cloud mode.&lt;/strong&gt; There is no toggle to send audio to a server for better accuracy. Your CPU or GPU is the only option.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No telemetry.&lt;/strong&gt; No usage stats, no crash phone-home. The only network calls are a license check, an update check, and the model downloads &lt;em&gt;you&lt;/em&gt; trigger — all documented, all disableable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audio never hits disk.&lt;/strong&gt; The buffer lives in RAM during transcription and is freed the moment the text is produced. Nothing to leak, nothing to recover.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Your voice is the most personal data you generate. It shouldn't leave your machine unless you explicitly send it somewhere. That isn't a tagline — it's why the product exists.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Platforms
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Linux&lt;/strong&gt; x86_64 — Ubuntu 22.04+, Fedora 38+, Arch, Debian 12+, CachyOS, openSUSE; X11 and Wayland; a single portable binary, no install.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows&lt;/strong&gt; x64 — 10 (21H2+) and 11; per-user graphical installer, no admin rights.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;macOS&lt;/strong&gt; — not yet; on the roadmap, no ETA.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It runs CPU-only on 8 GB of RAM with an AVX2 chip. For GPU acceleration you need &lt;strong&gt;Vulkan 1.2+&lt;/strong&gt; drivers — which means &lt;strong&gt;NVIDIA, AMD, and Intel Arc all work&lt;/strong&gt; from the same build, not just CUDA cards.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Pay once, own it forever — &lt;strong&gt;no subscription&lt;/strong&gt;. There's a &lt;strong&gt;14-day free trial&lt;/strong&gt; with every feature unlocked and no credit card. Download for Linux or Windows at &lt;strong&gt;&lt;a href="https://brethof.ai/voice" rel="noopener noreferrer"&gt;brethof.ai/voice&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Local. Private. Slightly opinionated.&lt;/p&gt;

</description>
      <category>localai</category>
      <category>privacy</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>Don't summarize your memory — search it</title>
      <dc:creator>BrethofAI</dc:creator>
      <pubDate>Fri, 22 May 2026 15:54:00 +0000</pubDate>
      <link>https://dev.to/brethofai/dont-summarize-your-memory-search-it-2gef</link>
      <guid>https://dev.to/brethofai/dont-summarize-your-memory-search-it-2gef</guid>
      <description>&lt;p&gt;Every long session with an AI coding agent eventually hits the same wall: the context window fills up, the conversation gets &lt;strong&gt;compacted&lt;/strong&gt;, and a summary takes the place of what actually happened. Summaries are lossy by design. The decision you made three sessions ago, the reason you ruled out approach B, the exact path you fixed last Tuesday — quietly gone, because something decided they weren't important enough to keep.&lt;/p&gt;

&lt;p&gt;I got tired of re-explaining my own project to my own assistant. So I built &lt;strong&gt;brethof-mind&lt;/strong&gt;: long-term memory for Claude Code (and Claude Desktop), built on SurrealDB. The core idea is in the title — instead of &lt;em&gt;summarizing&lt;/em&gt; your history down to fit, keep all of it and &lt;strong&gt;search&lt;/strong&gt; it.&lt;/p&gt;

&lt;p&gt;It's open source (MIT), runs 100% on your machine, and talks to no external API.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://github.com/BrethofAI/brethof-mind" rel="noopener noreferrer"&gt;https://github.com/BrethofAI/brethof-mind&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Two memories, not one
&lt;/h2&gt;

&lt;p&gt;Most "memory" tools give you a single bucket of notes. brethof-mind keeps &lt;strong&gt;two layers&lt;/strong&gt;, because they answer different questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Curated memory&lt;/strong&gt; — the things you &lt;em&gt;decide&lt;/em&gt; are worth pinning: architecture decisions, locked rules, project status, bugs and their fixes. Small, high-signal, hand-or-agent-curated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full chat archive&lt;/strong&gt; — every session you've ever had, stored verbatim and searchable. This is the safety net: when a summary would have dropped a detail, the raw exchange is still there to retrieve.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The curated layer answers &lt;em&gt;"what did we decide?"&lt;/em&gt; The archive answers &lt;em&gt;"what did we actually say back in March?"&lt;/em&gt; Together they mean a compaction is no longer a memory wipe — it's just the working context shrinking while the real record stays intact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three ways to search
&lt;/h2&gt;

&lt;p&gt;Different questions want different retrieval. brethof-mind exposes all three over MCP:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full-text (BM25)&lt;/strong&gt; — when you know the words. SurrealDB's full-text index, lowercased + stemmed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector similarity (HNSW)&lt;/strong&gt; — when you know the &lt;em&gt;meaning&lt;/em&gt; but not the words. Embeddings come from &lt;code&gt;fastembed&lt;/code&gt; (all-MiniLM-L6-v2, 384-dim) — local, fast, no API key.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph traversal&lt;/strong&gt; — records link to each other (decision → supersedes → decision; episode → covers → topic), so you can walk relationships, not just match text.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are 7 MCP tools in total — &lt;code&gt;semantic_search&lt;/code&gt;, &lt;code&gt;search_memory&lt;/code&gt;, &lt;code&gt;search_chat&lt;/code&gt;, &lt;code&gt;query_raw&lt;/code&gt;, &lt;code&gt;save_memory&lt;/code&gt;, &lt;code&gt;save_commit&lt;/code&gt;, &lt;code&gt;load_project&lt;/code&gt; — so the agent can pick the right retrieval for the question instead of being stuck with one.&lt;/p&gt;

&lt;h2&gt;
  
  
  100% local stack
&lt;/h2&gt;

&lt;p&gt;No cloud, no telemetry, no keys leaving your box:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SurrealDB&lt;/strong&gt; for storage (vector + full-text + graph in one engine).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;fastembed&lt;/strong&gt; for embeddings, on CPU.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FastMCP&lt;/strong&gt; over stdio for the server.&lt;/li&gt;
&lt;li&gt;Credentials via env vars; projects configured in a simple &lt;code&gt;projects.json&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/BrethofAI/brethof-mind
&lt;span class="nb"&gt;cd &lt;/span&gt;brethof-mind

&lt;span class="c"&gt;# 1. Bring up SurrealDB&lt;/span&gt;
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;

&lt;span class="c"&gt;# 2. Configure&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env                    &lt;span class="c"&gt;# set DB creds&lt;/span&gt;
&lt;span class="nb"&gt;cp &lt;/span&gt;projects.example.json projects.json
python mcp-server/scripts/init_db.py    &lt;span class="c"&gt;# create namespace + schema + indexes&lt;/span&gt;

&lt;span class="c"&gt;# 3. Register the MCP server with Claude Code (claude mcp add ...)&lt;/span&gt;
&lt;span class="c"&gt;# 4. Drop the hooks into your Claude settings (see settings.example.json)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full steps are in the README.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hooks are where it gets nice
&lt;/h2&gt;

&lt;p&gt;The MCP tools are useful on demand, but the &lt;strong&gt;hooks&lt;/strong&gt; make memory ambient — you don't have to remember to remember:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SessionStart&lt;/strong&gt; loads the relevant project memory into context the moment you open a session.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UserPromptSubmit&lt;/strong&gt; nudges the agent to &lt;em&gt;search memory first&lt;/em&gt; before answering questions about past decisions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stop&lt;/strong&gt; archives the session into the searchable chat history when you're done.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;commit hook&lt;/strong&gt; records each commit as a memory record, so your project history and your conversation history live in the same searchable place.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result: start a fresh session and your agent already knows where the project stands — no re-briefing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Works with
&lt;/h2&gt;

&lt;p&gt;Claude Code and Claude Desktop today (Desktop runs the Claude Code engine under the hood, so it gets the full hooks experience). OpenClaw and Hermes integrations are next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's free
&lt;/h2&gt;

&lt;p&gt;brethof-mind is MIT and will stay free. It comes from the team behind &lt;strong&gt;Brethof Voice Pro&lt;/strong&gt; (local, offline voice-to-text) — same principle: your data stays on your machine. This is the tooling we use ourselves; sharing it because the "summarize-your-memory" default deserves a better answer.&lt;/p&gt;

&lt;p&gt;If you try it, I'd genuinely like feedback on the hook design — that's the part with the most room to get smarter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/BrethofAI/brethof-mind" rel="noopener noreferrer"&gt;https://github.com/BrethofAI/brethof-mind&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>claude</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Transcribe, Translate, Timestamps, Fine tune, MCP server ALL in 1 app</title>
      <dc:creator>BrethofAI</dc:creator>
      <pubDate>Fri, 22 May 2026 11:00:59 +0000</pubDate>
      <link>https://dev.to/brethofai/transcribe-translate-timestamps-fine-tune-mcp-server-all-in-1-app-470a</link>
      <guid>https://dev.to/brethofai/transcribe-translate-timestamps-fine-tune-mcp-server-all-in-1-app-470a</guid>
      <description>&lt;h1&gt;
  
  
  Brethof Voice Pro v2.0.0 — offline speech-to-text, translation, and a voice keyboard in one app
&lt;/h1&gt;

&lt;p&gt;Your voice, transcribed and translated on your own machine. No cloud, no subscription, nothing leaving your laptop.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg09rxcxlz76l9a7jwzgg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg09rxcxlz76l9a7jwzgg.png" alt="Brethof Voice Pro v2.0.0" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Brethof Voice Pro just hit &lt;strong&gt;v2.0.0&lt;/strong&gt; — and it's no longer "just" dictation. It's a full local voice + translation layer for your desktop &lt;em&gt;and&lt;/em&gt; your AI stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🎙️ &lt;strong&gt;Transcribe 30 languages + 22 Chinese dialects&lt;/strong&gt; (52 in total) — from a file, your mic, or your system audio (meetings, videos, anything playing through your speakers).&lt;/li&gt;
&lt;li&gt;🌍 &lt;strong&gt;Translate across 38 languages&lt;/strong&gt;, fully offline. Choose the fast model or the quality one (Tencent's Hunyuan-MT — #1 in 30 of 31 categories at WMT25).&lt;/li&gt;
&lt;li&gt;⌨️ &lt;strong&gt;Voice keyboard&lt;/strong&gt; — dictate into &lt;em&gt;any&lt;/em&gt; app with a hotkey. New in v2.0: pick several target languages and it types the &lt;strong&gt;translation&lt;/strong&gt; as you speak.&lt;/li&gt;
&lt;li&gt;📝 &lt;strong&gt;Timestamped subtitles&lt;/strong&gt; — export SRT/VTT, sentence- or word-level.&lt;/li&gt;
&lt;li&gt;🧠 &lt;strong&gt;Fine-tune it on you&lt;/strong&gt; — every correction trains a personal model (one-click LoRA), so it learns your voice and your vocabulary.&lt;/li&gt;
&lt;li&gt;🔌 &lt;strong&gt;MCP server&lt;/strong&gt; — 19 tools so Claude or any MCP client can drive transcription and translation inside your own pipelines.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why it's different
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;100% local.&lt;/strong&gt; Audio and translations never leave your machine. Linux &amp;amp; Windows; GPU optional — runs on a plain laptop CPU.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All in one app.&lt;/strong&gt; Transcribe, translate, subtitle, dictate, fine-tune — no stitching five tools together.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No subscription.&lt;/strong&gt; Pay once, own it forever. &lt;strong&gt;14-day free trial, no credit card.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://brethof.ai/voice" rel="noopener noreferrer"&gt;brethof.ai/voice&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>productivity</category>
      <category>showdev</category>
    </item>
    <item>
      <title>I Built a $49 Voice-to-Text App That Never Touches the Cloud</title>
      <dc:creator>BrethofAI</dc:creator>
      <pubDate>Thu, 16 Apr 2026 07:32:31 +0000</pubDate>
      <link>https://dev.to/brethofai/i-built-a-49-voice-to-text-app-that-never-touches-the-cloud-4b4g</link>
      <guid>https://dev.to/brethofai/i-built-a-49-voice-to-text-app-that-never-touches-the-cloud-4b4g</guid>
      <description>&lt;h1&gt;
  
  
  Why I Built Brethof Voice Pro
&lt;/h1&gt;

&lt;p&gt;I got tired of two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Voice-to-text tools that only work well in English&lt;/li&gt;
&lt;li&gt;Every good STT solution either costs $700 (Dragon) or sends your voice to the cloud monthly&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So I built something different.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Does
&lt;/h2&gt;

&lt;p&gt;Brethof Voice Pro is a desktop app (Windows + Linux) that converts speech to text using AI — entirely on your device. Press Ctrl+D, speak, text appears where your cursor is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero cloud calls during transcription.&lt;/strong&gt; Models download once on first launch, then it works fully offline.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Qwen3-ASR engine&lt;/strong&gt; — 1.84% word error rate across 10 languages (arXiv 2601.21337)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GGUF models via llama.cpp&lt;/strong&gt; — 6 quantization tiers from 1 GB to 3.2 GB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vulkan GPU acceleration&lt;/strong&gt; — works on NVIDIA, AMD, Intel, and CPU-only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepFilter&lt;/strong&gt; noise reduction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;36 languages&lt;/strong&gt; with auto-detection&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why It Matters
&lt;/h2&gt;

&lt;p&gt;If you speak Thai, Polish, Arabic, Vietnamese, Korean, or dozens of other languages — there has been &lt;strong&gt;no good voice-to-text option&lt;/strong&gt; for you. Dragon doesn't support most languages. Google's API charges per-minute and requires internet. Whisper's accuracy on non-English languages is mediocre.&lt;/p&gt;

&lt;p&gt;Qwen3-ASR changes this. State-of-the-art accuracy across 36 languages, running locally on consumer hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;p&gt;$49 one-time. Perpetual license. No subscription.&lt;/p&gt;

&lt;p&gt;Compare: Dragon $699. Otter.ai $100-240/year. Google STT per-minute cloud pricing.&lt;/p&gt;

&lt;p&gt;14-day free trial, no credit card: &lt;a href="https://brethof.com" rel="noopener noreferrer"&gt;brethof.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Happy to answer questions about the architecture, GGUF quantization, or Vulkan vs CUDA tradeoffs.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdla86uwsr756ld39t3ph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdla86uwsr756ld39t3ph.png" alt=" " width="800" height="264"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>privacy</category>
      <category>linux</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
