<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Matt Macosko</title>
    <description>The latest articles on DEV Community by Matt Macosko (@matt_macosko_f3829cfd86b8).</description>
    <link>https://dev.to/matt_macosko_f3829cfd86b8</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3881937%2Fd462eaf2-e4e1-452e-82b5-c8a66e8941d1.jpg</url>
      <title>DEV Community: Matt Macosko</title>
      <link>https://dev.to/matt_macosko_f3829cfd86b8</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/matt_macosko_f3829cfd86b8"/>
    <language>en</language>
    <item>
      <title>Eight local AI agents on a Mac mini — and the product I'm building from them</title>
      <dc:creator>Matt Macosko</dc:creator>
      <pubDate>Tue, 26 May 2026 19:04:03 +0000</pubDate>
      <link>https://dev.to/matt_macosko_f3829cfd86b8/eight-local-ai-agents-on-a-mac-mini-and-the-product-im-building-from-them-1i50</link>
      <guid>https://dev.to/matt_macosko_f3829cfd86b8/eight-local-ai-agents-on-a-mac-mini-and-the-product-im-building-from-them-1i50</guid>
      <description>&lt;p&gt;A case study went around recently: a lawyer had wired up 66 AI agents on a Mac mini for his own firm — every one running locally, nothing touching a cloud API — and was looking for a commercial partner before releasing it as open source.&lt;/p&gt;

&lt;p&gt;I read that and realized I had been building the same shape of thing for my own small business for the last year. Eight ambient agents, all running on Apple Silicon, none of them touching cloud APIs. I had not productized any of them. I had also not gotten paid for any of them.&lt;/p&gt;

&lt;p&gt;This post is about what I'm doing about that.&lt;/p&gt;

&lt;h2&gt;
  
  
  The asset base
&lt;/h2&gt;

&lt;p&gt;I shipped a repo called claude-code-local. It's an MLX server that wraps three local language models (Gemma 4 31B, Llama 3.3 70B, Qwen 3.5 122B MoE) behind an OpenAI-compatible API. The setup script picks the right model for your hardware, downloads it, and puts a launcher on your Desktop. Three commands and you're running a 31-billion-parameter language model on your MacBook.&lt;/p&gt;

&lt;p&gt;It has 2,689 stars and 516 forks as of this writing. License is MIT. The whole thing is at github.com/nicedreamzapp/claude-code-local.&lt;/p&gt;

&lt;p&gt;That's the engine. What I had not built was the funnel.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gap that ate me for a month
&lt;/h2&gt;

&lt;p&gt;Stars are people who think "I would love this." Forks are people who started doing the work. Neither converts to revenue.&lt;/p&gt;

&lt;p&gt;Reading that case study, what jumped out was: a lawyer with the technical chops to wire 66 agents could not productize the result. He has the buyer relationships (he is the buyer); I have the go-to-market background (I've sold consumer hardware direct to customers for years). The intersection nobody is shipping is "Mac mini with this stack pre-installed, delivered to your law firm."&lt;/p&gt;

&lt;p&gt;So I sat down and mapped what was actually missing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The market gap, in numbers
&lt;/h2&gt;

&lt;p&gt;I researched the on-device AI market for May 2026. Here is what I found:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Free OSS (Ollama, LM Studio, Jan, Atomic Bot, my own repo): saturated, no money flowing&lt;/li&gt;
&lt;li&gt;Paid Mac App Store apps (Private LLM, Enclave AI, Local LLM): $10-30 one-time, real revenue for bootstrapped 2-person teams&lt;/li&gt;
&lt;li&gt;Cloud SaaS with zero-data-retention (Spellbook, CoCounsel): $69-$149/mo per seat, cloud-not-local&lt;/li&gt;
&lt;li&gt;Enterprise legal AI (Harvey): $1,200+/seat/month&lt;/li&gt;
&lt;li&gt;AI-native law firms (Manifest, Avantia, General Legal): not selling tools, they ARE the firm&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two things stand out:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The gap between "free OSS" and "$1,200/seat SaaS" is not being filled by subscription products. Private LLM's App Store reviews explicitly call out "no subscription" as the reason buyers picked them. The privacy buyer rejects recurring billing for privacy software. This is not opinion; it is in their reviews.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Mac mini hardware bundle for privileged work is being built privately but not productized. Nobody is shipping it as a SKU.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is the gap. That is what I am putting a product against.&lt;/p&gt;

&lt;h2&gt;
  
  
  The ladder
&lt;/h2&gt;

&lt;p&gt;I'm skipping the standard SaaS playbook because the market is telling me to. The privacy buyer rejects recurring billing — so the spine is one-time purchases, not subscriptions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rung&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;Audience&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free repo&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Existing OSS audience, top of funnel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AirGap Box&lt;/td&gt;
&lt;td&gt;$2,995 (Base) / $3,995 (Pro)&lt;/td&gt;
&lt;td&gt;Small firms wanting sovereignty without DIY&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Foundation consulting&lt;/td&gt;
&lt;td&gt;scoped&lt;/td&gt;
&lt;td&gt;Firms ready for deeper, white-glove deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The free repo gets you curious. The Box converts the firms who would rather pay $3k than learn MLX. Foundation is for the firms that want it installed, tuned, and documented for their compliance counsel.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three core agents
&lt;/h2&gt;

&lt;p&gt;Three agents that drop on top of any OpenAI-compatible local LLM server (the free claude-code-local repo by default, but they work with Ollama and others too) — and ship pre-installed on the Box:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Folder Watcher — drop a PDF, text, or Markdown file into ~/AirGap-Inbox/. Within 30 seconds, a structured Markdown summary appears in _summaries/. Uses macOS's textutil and mdimport for PDFs and docx without external dependencies.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Daily Briefing — every morning at 7:00 a LaunchAgent reads the folders you list in a config file and writes a one-page digest of what changed in the last 24 hours.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Local Q&amp;amp;A — a CLI command &lt;code&gt;airgap ask "your question"&lt;/code&gt; that answers from a single document you point it at, fully offline. Folder-wide indexing and citations across many files is on the roadmap — not something I'd oversell today.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The whole set is ~300 lines of Python. The installer is a &lt;code&gt;.command&lt;/code&gt; file you double-click. Total install time: under 60 seconds on the right hardware.&lt;/p&gt;

&lt;p&gt;I deliberately built them with zero authentication required. No IMAP credentials, no OAuth dance, no API keys — the install works for everyone on day one. The agents that need credentials (email drafting, Reddit lurking, calendar parsing) run on my own machines and ship in the AirGap Box, where there's a setup call to wire them up properly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's in the AirGap Box
&lt;/h2&gt;

&lt;p&gt;The same stack on a Mac mini that arrives at your office. Pre-installed: claude-code-local MLX server, the three core agents, two more that need credentials (email drafter, prompt library), a default-blocked firewall, and a printed compliance memo template.&lt;/p&gt;

&lt;p&gt;Base ($2,995): Mac mini M4 16GB, Gemma 4 31B preloaded.&lt;br&gt;
Pro ($3,995): Mac mini M4 Pro 24GB, Llama 3.3 70B preloaded.&lt;/p&gt;

&lt;p&gt;Both include a 90-minute Zoom setup call and 30 days of email support. After that, you have a working private-AI workstation. We do not phone home. We do not see your data.&lt;/p&gt;

&lt;p&gt;COGS on the base unit is around $940. Margin around 69%. Cash-flow safe because Stripe charges at purchase; hardware ordered after.&lt;/p&gt;

&lt;h2&gt;
  
  
  Validation before scale
&lt;/h2&gt;

&lt;p&gt;I am not ordering inventory on speculation. The plan is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Day 0: launch publicly, with the Box gated by a waitlist (no checkout yet)&lt;/li&gt;
&lt;li&gt;Day 7: first review — Box waitlist signups&lt;/li&gt;
&lt;li&gt;Day 14: hard gate on Box — 20+ verified emails or we revisit positioning&lt;/li&gt;
&lt;li&gt;Day 30: full P&amp;amp;L review, decide whether to scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the Box waitlist hits 20 in 14 days, I order the first 3 Mac minis. If it hits 5, I have not validated the rung; I revisit positioning before spending hardware money.&lt;/p&gt;

&lt;p&gt;This is the part the grifter posts skip. "I made $14,200/month in 72 hours" is not a thing that happens to honest businesses. What happens to honest businesses is "I opened a waitlist, watched signups for two weeks, decided whether to order inventory based on actual demand, and reported the real numbers."&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest year-1 range
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Month&lt;/th&gt;
&lt;th&gt;Pessimistic&lt;/th&gt;
&lt;th&gt;Realistic&lt;/th&gt;
&lt;th&gt;Stretch&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$400&lt;/td&gt;
&lt;td&gt;$2,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;$200&lt;/td&gt;
&lt;td&gt;$1,500&lt;/td&gt;
&lt;td&gt;$6,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;$800&lt;/td&gt;
&lt;td&gt;$3,500&lt;/td&gt;
&lt;td&gt;$12,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;$1,500&lt;/td&gt;
&lt;td&gt;$6,000-8,000&lt;/td&gt;
&lt;td&gt;$20,000+&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;To hit the grifter's $14k/month claim, every rung needs to perform near the top of its range AND consulting needs to land. That happens in months 12-18 with discipline, not in 72 hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I will publish honestly
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Weekly waitlist + signup numbers (real, not vanity)&lt;/li&gt;
&lt;li&gt;The first refund (when it happens) and why&lt;/li&gt;
&lt;li&gt;The first Box install case study (with the firm's permission)&lt;/li&gt;
&lt;li&gt;The Box waitlist → order conversion, as it happens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to see whether this business works, follow along on github.com/nicedreamzapp/claude-code-local and the AirGap landing pages. I'll post the numbers as they happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Free repo: github.com/nicedreamzapp/claude-code-local&lt;/li&gt;
&lt;li&gt;AirGap Box waitlist: nicedreamzwholesale.com/airgap-box&lt;/li&gt;
&lt;li&gt;Demo (NDA review on a laptop with Wi-Fi physically off, lsof on screen): youtube.com/watch?v=V_J1LpNGwmY&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why I'm posting this
&lt;/h2&gt;

&lt;p&gt;Because the next person to read a case study like that and think "I want this" deserves to find a productized version, not another Hacker News thread about MLX setup. And because every honest version of this story I publish is also a receipt — for me, that I built something real, and for the next builder, that the playbook works.&lt;/p&gt;

&lt;p&gt;If you have feedback on the pricing, the positioning, or the agents, leave a comment. I read everything.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm building AirGap — pre-configured local AI on a Mac mini for firms handling private work. &lt;a href="https://nicedreamzwholesale.com/airgap-box" rel="noopener noreferrer"&gt;Join the AirGap Box waitlist&lt;/a&gt;, grab the &lt;a href="https://github.com/nicedreamzapp/claude-code-local" rel="noopener noreferrer"&gt;free open-source stack&lt;/a&gt;, or see &lt;a href="https://nicedreamzwholesale.com/airgap" rel="noopener noreferrer"&gt;AirGap consulting&lt;/a&gt; for compliance-sensitive firms (law, medical, finance).&lt;/em&gt;&lt;/p&gt;

</description>
      <category>privacy</category>
      <category>opensource</category>
      <category>macos</category>
      <category>ai</category>
    </item>
    <item>
      <title>Why I Quantize Open-Weight Models for Macs — And Why Your Law Firm Should Care</title>
      <dc:creator>Matt Macosko</dc:creator>
      <pubDate>Fri, 22 May 2026 05:22:16 +0000</pubDate>
      <link>https://dev.to/matt_macosko_f3829cfd86b8/why-i-quantize-open-weight-models-for-macs-and-why-your-law-firm-should-care-2725</link>
      <guid>https://dev.to/matt_macosko_f3829cfd86b8/why-i-quantize-open-weight-models-for-macs-and-why-your-law-firm-should-care-2725</guid>
      <description>&lt;p&gt;Every couple of weeks a new instruction-tuned model hits Hugging Face, and within hours the GGUF community has six variants up for &lt;code&gt;llama.cpp&lt;/code&gt; users on Linux and Windows. The MLX community — the people running on M-series Macs — usually has to wait days or weeks, sometimes never.&lt;/p&gt;

&lt;p&gt;I publish a few MLX quantizations a month under &lt;a href="https://huggingface.co/divinetribe" rel="noopener noreferrer"&gt;huggingface.co/divinetribe&lt;/a&gt; to close that gap. Two of mine have crossed 1,000 downloads in the last 30 days. The latest, &lt;a href="https://huggingface.co/divinetribe/Hermes-4-14B-abliterated-4bit-mlx" rel="noopener noreferrer"&gt;Hermes-4-14B-abliterated-4bit-mlx&lt;/a&gt;, shipped today.&lt;/p&gt;

&lt;p&gt;The downloads tell me something I already suspected: there is real, sustained demand for capable models that run entirely on Apple Silicon. And I think the most interesting buyers are not who you'd guess.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hobbyist case is easy
&lt;/h2&gt;

&lt;p&gt;Mac developers want models that "just work" on their hardware. MLX uses Apple's unified memory and Metal Performance Shaders. When a model lands in MLX format, inference goes from "this works through a compatibility shim" to "this is what the hardware was built for." Tokens-per-second jumps. Battery drain falls. The fan stays quiet.&lt;/p&gt;

&lt;p&gt;That's fine. That's the easy demand to serve. Hobbyists, indie developers, students with M1 MacBook Airs. The downloads come in steady, and the audience expands a little every quarter.&lt;/p&gt;

&lt;h2&gt;
  
  
  The interesting case is law firms
&lt;/h2&gt;

&lt;p&gt;Here's the part nobody talks about.&lt;/p&gt;

&lt;p&gt;If you're a partner at a law firm handling NDAs, M&amp;amp;A docs, IP filings, sealed depositions — anything privileged — sending that content to OpenAI, Anthropic, or Google for "AI summarization" is, depending on your jurisdiction, somewhere between professionally negligent and outright disqualifying. The terms-of-service for every major cloud LLM provider explicitly retain the right to log your prompts. Many train on them. The opt-outs are partial at best and legally untested at worst.&lt;/p&gt;

&lt;p&gt;So firms either:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pretend AI doesn't exist (most do, currently)&lt;/li&gt;
&lt;li&gt;Run an on-prem private cloud (six-figure setup, IT-heavy, hardware refresh every 18 months)&lt;/li&gt;
&lt;li&gt;Use one of the "enterprise-grade" SaaS LLM wrappers that promises not to log (the promise is contractual, not technical)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Or — and this is the option that almost nobody is talking about — you put a MacBook Pro on the partner's desk, install a local LLM, point a chat client at &lt;code&gt;localhost:8080&lt;/code&gt;, and the document never leaves the machine. Network-disconnect the laptop entirely during sensitive work, and you have a literal air gap. The prompt and the response live on a single piece of silicon owned by the firm.&lt;/p&gt;

&lt;p&gt;That's not theoretical. I've built that exact stack — it's the &lt;a href="https://nicedreamzwholesale.com/airgap/" rel="noopener noreferrer"&gt;AirGap AI&lt;/a&gt; consulting practice. The 14-day pilot ships &lt;a href="https://github.com/nicedreamzapp/claude-code-local" rel="noopener noreferrer"&gt;claude-code-local&lt;/a&gt; (the on-device Claude Code replacement, currently 2,664 stars on GitHub) onto a firm-owned MacBook, with verified network audits proving nothing leaks. The models that run inside it are exactly the ones I publish on Hugging Face — Gemma 4 31B for everyday work, Llama 3.3 70B for harder reasoning, and now Hermes 4 14B for instruction-following without refusal noise.&lt;/p&gt;

&lt;p&gt;The legal sector is the obvious first market, but the same pitch works for any field where confidentiality has teeth: medical records, due diligence, journalism source protection, internal investigations, M&amp;amp;A under embargo, defense contracting. Anywhere "your prompt cached on someone else's server" is a deal-breaker, an on-device MLX model is the answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why open weights specifically
&lt;/h2&gt;

&lt;p&gt;A cloud LLM is a black box. You send a prompt, the answer comes back, you trust that the provider isn't reading or training on it. The trust is enforced by a terms-of-service document and a privacy policy. If those change, your only recourse is to stop using the service. The content you already sent is, presumably, gone — but you have no way to verify.&lt;/p&gt;

&lt;p&gt;Open weights flip that around. The model file lives on the firm's hardware. The firm's IT team can inspect every byte. Network monitoring tools can confirm that no inference traffic leaves the building. The model is auditable in a way no SaaS API will ever be. If the underlying open-weight model gets pulled from Hugging Face tomorrow, the firm's copy keeps working forever.&lt;/p&gt;

&lt;p&gt;That permanence — the fact that today's open model is also next decade's open model, as long as someone keeps a copy — is the part of the pitch that I think clinches it for compliance officers. You can't subpoena Anthropic for a prompt you ran on your own MacBook in 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you can do with this
&lt;/h2&gt;

&lt;p&gt;If you're a developer with a Mac, grab &lt;a href="https://huggingface.co/divinetribe/Hermes-4-14B-abliterated-4bit-mlx" rel="noopener noreferrer"&gt;Hermes-4-14B-abliterated-4bit-mlx&lt;/a&gt; and try it. It's ~8 GB, runs on a 16 GB Mac, and the install is &lt;code&gt;pip install mlx-lm&lt;/code&gt; plus three lines of Python. The model card has the recipe.&lt;/p&gt;

&lt;p&gt;If you're a partner, IT director, or in-house counsel at a firm that handles privileged content, &lt;a href="https://nicedreamzwholesale.com/airgap/" rel="noopener noreferrer"&gt;the AirGap AI pilot&lt;/a&gt; is the fastest path from "we're worried about AI confidentiality" to "we have a verified on-device setup." Two weeks, fixed scope, network audit included.&lt;/p&gt;

&lt;p&gt;If you're a researcher or quant who wants the next model on Apple Silicon before anyone else has it, &lt;a href="https://huggingface.co/divinetribe" rel="noopener noreferrer"&gt;follow divinetribe on Hugging Face&lt;/a&gt;. The release cadence is irregular but the targets are deliberate — I publish what I'd actually want to use.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this is not
&lt;/h2&gt;

&lt;p&gt;I'm not anti-cloud. The frontier models from Anthropic and OpenAI are genuinely better at the hardest reasoning tasks, and for non-confidential work the cloud is the right default. I use Claude every day for my own coding.&lt;/p&gt;

&lt;p&gt;I'm also not promising that local-first is free. There's a real cost to running serious local inference: hardware, electricity, the time to keep the stack updated. For most workflows the cloud is cheaper.&lt;/p&gt;

&lt;p&gt;But for the cases where confidentiality is non-negotiable — and there are more of those than the AI industry currently admits — local-first is the only honest answer. Open weights, Apple Silicon, MLX. That's the stack. I'll keep publishing pieces of it on Hugging Face for as long as the downloads tell me people are using them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The new model: &lt;a href="https://huggingface.co/divinetribe/Hermes-4-14B-abliterated-4bit-mlx" rel="noopener noreferrer"&gt;Hermes-4-14B-abliterated-4bit-mlx&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;All MLX models I maintain: &lt;a href="https://huggingface.co/divinetribe" rel="noopener noreferrer"&gt;huggingface.co/divinetribe&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Showcase + cross-links: &lt;a href="https://nicedreamzwholesale.com/software/huggingface/" rel="noopener noreferrer"&gt;nicedreamzwholesale.com/software/huggingface/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;On-device Claude Code: &lt;a href="https://github.com/nicedreamzapp/claude-code-local" rel="noopener noreferrer"&gt;github.com/nicedreamzapp/claude-code-local&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The consulting pilot: &lt;a href="https://nicedreamzwholesale.com/airgap/" rel="noopener noreferrer"&gt;AirGap AI&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;14-day on-device AI pilot for law firms, medical orgs, and any team where privileged content can't leave the machine — see &lt;a href="https://nicedreamzwholesale.com/airgap/" rel="noopener noreferrer"&gt;AirGap AI&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mlx</category>
      <category>applesilicon</category>
      <category>localai</category>
    </item>
    <item>
      <title>Hermes-4-14B Abliterated, MLX 4-bit — Apple Silicon Just Got Another Real Model</title>
      <dc:creator>Matt Macosko</dc:creator>
      <pubDate>Fri, 22 May 2026 05:20:56 +0000</pubDate>
      <link>https://dev.to/matt_macosko_f3829cfd86b8/hermes-4-14b-abliterated-mlx-4-bit-apple-silicon-just-got-another-real-model-34b7</link>
      <guid>https://dev.to/matt_macosko_f3829cfd86b8/hermes-4-14b-abliterated-mlx-4-bit-apple-silicon-just-got-another-real-model-34b7</guid>
      <description>&lt;p&gt;When Babsie uploaded &lt;code&gt;Hermes-4-14B-BF16-abliterated&lt;/code&gt; to Hugging Face yesterday, the only way to run it on a Mac was to download 28 GB of BF16 weights and feed them to &lt;code&gt;transformers&lt;/code&gt; — which on Apple Silicon means falling back to PyTorch MPS, which is fine but not what the hardware was built for.&lt;/p&gt;

&lt;p&gt;So I converted it to MLX 4-bit. ~8 GB on disk, runs at the speed Apple Silicon was actually designed for, and the model card is now live at:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://huggingface.co/divinetribe/Hermes-4-14B-abliterated-4bit-mlx" rel="noopener noreferrer"&gt;huggingface.co/divinetribe/Hermes-4-14B-abliterated-4bit-mlx&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Free, open weights, drop-in for any &lt;code&gt;mlx-lm&lt;/code&gt; workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it is
&lt;/h2&gt;

&lt;p&gt;Hermes 4 is NousResearch's instruction-tuned model family. The 14B variant is built on Qwen3, so it inherits Qwen3's tokenizer, chat template, and architectural quirks — but the post-training is pure Hermes, which means it's tuned for tool use, role-play, structured output, and not refusing benign-but-edgy questions.&lt;/p&gt;

&lt;p&gt;Babsie's abliteration applies refusal-direction projection per Arditi et al. (2024) to the BF16 weights, which suppresses the model's built-in refusal vector. Plain English: the model will answer questions that the upstream Hermes would politely decline. You become the moderator.&lt;/p&gt;

&lt;p&gt;My contribution is the boring-but-useful part: convert those BF16 weights to MLX, quantize to 4 bits with group size 64, ship.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for Mac users
&lt;/h2&gt;

&lt;p&gt;14B is the sweet spot on Apple Silicon. Big enough to be genuinely useful for instruction-following, small enough to fit comfortably in 16 GB of unified memory at 4-bit, and tiny enough on disk that you can keep a half-dozen variants without thinking about it.&lt;/p&gt;

&lt;p&gt;For comparison:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Llama 3.3 70B 8-bit MLX (also on my Hugging Face) — 75 GB, needs a 96 GB machine&lt;/li&gt;
&lt;li&gt;Gemma 4 31B 4-bit MLX — 17 GB, runs on any 32 GB Mac&lt;/li&gt;
&lt;li&gt;Hermes 4 14B 4-bit MLX — 8 GB, runs on a 16 GB Mac, snappy on anything bigger&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have an M-series Mac and you want a capable instruction-tuned model that doesn't refuse benign requests and doesn't phone home, this is the easiest install on the list.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to run it
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;mlx_lm&lt;/code&gt; package does all the heavy lifting. From a terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;mlx-lm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mlx_lm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generate&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;divinetribe/Hermes-4-14B-abliterated-4bit-mlx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a haiku about local inference.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;add_generation_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or as a server, using &lt;code&gt;mlx-lm&lt;/code&gt;'s OpenAI-compatible endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mlx_lm.server &lt;span class="nt"&gt;--model&lt;/span&gt; divinetribe/Hermes-4-14B-abliterated-4bit-mlx &lt;span class="nt"&gt;--port&lt;/span&gt; 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives you a local &lt;code&gt;http://localhost:8080/v1/chat/completions&lt;/code&gt; endpoint that any OpenAI SDK client can hit. No tokens, no API bills, no telemetry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it slots into the local-first stack
&lt;/h2&gt;

&lt;p&gt;I maintain &lt;a href="https://github.com/nicedreamzapp/claude-code-local" rel="noopener noreferrer"&gt;claude-code-local&lt;/a&gt; on GitHub — a project that lets you run Claude Code 100% on-device using local MLX models in place of the Anthropic API. Hermes 4 14B abliterated is now a supported backend, sitting between Gemma 4 31B (the everyday workhorse) and Llama 3.3 70B (the heavy lifter) in terms of size and speed.&lt;/p&gt;

&lt;p&gt;For workflows where refusals are noise (security research, fiction with edge, prompt-engineering experiments where the same prompt needs to hit both refusing and non-refusing models for comparison), the abliterated variant saves a lot of re-prompting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I publish these
&lt;/h2&gt;

&lt;p&gt;From where I sit, the Apple Silicon community is underserved on model variety. The big quantization shops mostly target llama.cpp and GGUF, which is fantastic for Linux/Windows GPU workflows but adds a translation layer on Macs. MLX is Apple's own framework, and when a model lands in MLX format the experience on a Mac goes from "this works" to "this is what the hardware is for."&lt;/p&gt;

&lt;p&gt;So a few times a month, when a hot new model drops, I take the abliterated BF16 (if a trusted shop has done the abliteration upstream) and run it through &lt;code&gt;mlx_lm.convert&lt;/code&gt;. Takes about an hour from &lt;code&gt;git pull&lt;/code&gt; to &lt;code&gt;huggingface-cli upload&lt;/code&gt;. Costs me ~80 GB of disk during the build and ~8 GB to keep around.&lt;/p&gt;

&lt;p&gt;The downloads page tells me people are finding them. Llama 3.3 70B 8-bit MLX and Gemma 4 31B 4-bit MLX each pull ~1,000 downloads every 30 days, almost entirely from Mac users running them through &lt;code&gt;mlx-lm&lt;/code&gt; or &lt;code&gt;claude-code-local&lt;/code&gt;. Hermes 4 14B should be more accessible than either, since 16 GB Macs are everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;I'm watching for the BF16 abliterated source of DeepSeek V4 Flash to drop publicly. The GGUF version (&lt;code&gt;cyberneurova/CyberNeurova-DeepSeek-V4-Flash-abliterated-GGUF&lt;/code&gt;) already has 86 K downloads, but the BF16 source isn't published yet — the moment it lands, I'll be the first to ship an MLX 4-bit. DeepSeek V4 Flash on Apple Silicon at ~140 GB is going to be a moment for people with 128 GB+ Macs.&lt;/p&gt;

&lt;p&gt;Until then: enjoy Hermes 4 14B. The model card has the full install recipe and a benchmark snippet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Model on Hugging Face: &lt;a href="https://huggingface.co/divinetribe/Hermes-4-14B-abliterated-4bit-mlx" rel="noopener noreferrer"&gt;divinetribe/Hermes-4-14B-abliterated-4bit-mlx&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Source: &lt;a href="https://huggingface.co/Babsie/Hermes-4-14B-BF16-abliterated" rel="noopener noreferrer"&gt;Babsie/Hermes-4-14B-BF16-abliterated&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;All my MLX models: &lt;a href="https://huggingface.co/divinetribe" rel="noopener noreferrer"&gt;huggingface.co/divinetribe&lt;/a&gt; (or the showcase page at &lt;a href="https://nicedreamzwholesale.com/software/huggingface/" rel="noopener noreferrer"&gt;nicedreamzwholesale.com/software/huggingface/&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Local Claude Code: &lt;a href="https://github.com/nicedreamzapp/claude-code-local" rel="noopener noreferrer"&gt;github.com/nicedreamzapp/claude-code-local&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Local-AI consulting for compliance-sensitive firms: &lt;a href="https://nicedreamzwholesale.com/airgap/" rel="noopener noreferrer"&gt;AirGap AI&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Local-AI consulting for compliance-sensitive firms (law, medical, finance) — see &lt;a href="https://nicedreamzwholesale.com/airgap/" rel="noopener noreferrer"&gt;AirGap AI&lt;/a&gt;. Model: &lt;a href="https://huggingface.co/divinetribe/Hermes-4-14B-abliterated-4bit-mlx" rel="noopener noreferrer"&gt;https://huggingface.co/divinetribe/Hermes-4-14B-abliterated-4bit-mlx&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mlx</category>
      <category>applesilicon</category>
      <category>hermes</category>
    </item>
    <item>
      <title>504,571 Brain Cells, 4 Labs, One Hypothesis: A Citizen Pass at Parkinson's</title>
      <dc:creator>Matt Macosko</dc:creator>
      <pubDate>Fri, 22 May 2026 02:03:23 +0000</pubDate>
      <link>https://dev.to/matt_macosko_f3829cfd86b8/504571-brain-cells-4-labs-one-hypothesis-a-citizen-pass-at-parkinsons-2imj</link>
      <guid>https://dev.to/matt_macosko_f3829cfd86b8/504571-brain-cells-4-labs-one-hypothesis-a-citizen-pass-at-parkinsons-2imj</guid>
      <description>&lt;p&gt;&lt;a href="https://youtu.be/bC4hgeHS9cg" rel="noopener noreferrer"&gt;Watch the companion video — narrated walkthrough of the meta-analysis&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I want to say upfront that I am not a neuroscientist. I run a small company in Humboldt County, California, and most of my days are about shipping orders and writing code. But over the last few weeks I spent my nights pulling down four public single-cell RNA datasets and running the same statistical pass across all of them, and the result was clean enough that I think it is worth talking about.&lt;/p&gt;

&lt;p&gt;The short version: when you pool 504,571 single-cell measurements of human midbrain tissue from four different research groups, one specific subtype of dopamine neuron — the cells that express the angiotensin II type 1 receptor, AGTR1 — is depleted in every single Parkinson's dataset compared to controls. Combined odds ratio 0.215. P-value less than 10 to the negative one hundredth power. That is not a marginal effect. That is one of the most depleted druggable cell types I have ever seen reported, and the receptor is already targetable by FDA-approved blood-pressure drugs that have been on the market for thirty years.&lt;/p&gt;

&lt;p&gt;That is the headline. Here is the longer story of how I got there, what I think it means, and — more importantly — what I do not think it means, because the easiest way to embarrass yourself in a field you do not belong to is to overclaim.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the idea actually came from
&lt;/h2&gt;

&lt;p&gt;I did not discover this. Tushar Kamath and his colleagues at the Broad Institute published the original observation in Nature Neuroscience in 2022. They ran single-nucleus RNA sequencing on postmortem human midbrain tissue from Parkinson's patients and controls, and they noticed that one subtype of dopamine neuron — defined by the markers SOX6 and AGTR1 — was preferentially lost in disease. It was a careful paper, peer-reviewed, well-cited. It did the science. What it did not do, and what no single study can ever do, is rule out the possibility that the effect they saw was an artifact of their particular cohort, their particular dissection protocol, or their particular sequencing chemistry.&lt;/p&gt;

&lt;p&gt;That is what meta-analysis is for. You take the same biological question and you ask it of as many independent datasets as you can find. If the effect is real, it shows up across cohorts. If it is artifact, it washes out.&lt;/p&gt;

&lt;p&gt;I am a hobbyist in this field, but I am not bad at this part of it. I have spent the last year teaching myself single-cell analysis on the side. I knew that since Kamath 2022, three more independent groups had released public single-cell Parkinson's datasets — Smajić et al. at DZNE in Germany, Wang et al. at Mount Sinai, and Martirosyan et al., an independent cohort released in 2024. None of them, as far as I could find, had run the AGTR1 question across each other's data. So I did.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I actually ran
&lt;/h2&gt;

&lt;p&gt;The pipeline is unsexy. Python, scanpy, statsmodels. Pull each dataset from GEO. Normalize. Run the same cell-typing logic on each one so the labels are consistent across studies. Count the fraction of dopamine neurons in each donor that fall into the AGTR1-positive subtype. Compare patients to controls. Pool the four resulting odds ratios using fixed-effect meta-analysis with effect-size weighting.&lt;/p&gt;

&lt;p&gt;That is it. There is no fancy machine learning, no transformer, no proprietary model. The whole repository is a few thousand lines of straightforward bioinformatics that anyone with a laptop can rerun. I made sure of that on purpose. The code is on GitHub, the data is public, the figures are reproducible end to end with a single shell script.&lt;/p&gt;

&lt;p&gt;The four-cohort breakdown looks like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dataset&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;Cells&lt;/th&gt;
&lt;th&gt;Control AGTR1+ fraction&lt;/th&gt;
&lt;th&gt;PD AGTR1+ fraction&lt;/th&gt;
&lt;th&gt;Odds ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GSE184950&lt;/td&gt;
&lt;td&gt;Mount Sinai&lt;/td&gt;
&lt;td&gt;2022&lt;/td&gt;
&lt;td&gt;12,778&lt;/td&gt;
&lt;td&gt;3.31%&lt;/td&gt;
&lt;td&gt;1.00%&lt;/td&gt;
&lt;td&gt;0.295&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GSE178265&lt;/td&gt;
&lt;td&gt;Broad Institute&lt;/td&gt;
&lt;td&gt;2022&lt;/td&gt;
&lt;td&gt;366,874&lt;/td&gt;
&lt;td&gt;3.30%&lt;/td&gt;
&lt;td&gt;0.60%&lt;/td&gt;
&lt;td&gt;0.177&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GSE157783&lt;/td&gt;
&lt;td&gt;DZNE Germany&lt;/td&gt;
&lt;td&gt;2022&lt;/td&gt;
&lt;td&gt;41,435&lt;/td&gt;
&lt;td&gt;3.30%&lt;/td&gt;
&lt;td&gt;1.00%&lt;/td&gt;
&lt;td&gt;0.296&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GSE243639&lt;/td&gt;
&lt;td&gt;Independent&lt;/td&gt;
&lt;td&gt;2024&lt;/td&gt;
&lt;td&gt;83,484&lt;/td&gt;
&lt;td&gt;2.96%&lt;/td&gt;
&lt;td&gt;0.89%&lt;/td&gt;
&lt;td&gt;0.295&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every single one points in the same direction. Every single one. The combined odds ratio of 0.215 means roughly a 78 percent reduction in this cell type in Parkinson's brains across half a million cells from four independent cohorts run by four independent groups using slightly different protocols. That is the kind of consistency you almost never see in a noisy field like single-cell genomics, and it is the kind of consistency that — if I were a real neuroscientist running a real lab — would make me drop my other projects and chase this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this could actually matter
&lt;/h2&gt;

&lt;p&gt;AGTR1 is the receptor that angiotensin II binds to. It is the same receptor that gets blocked by the entire class of drugs called angiotensin receptor blockers — ARBs. Losartan. Candesartan. Telmisartan. These are some of the most widely prescribed blood-pressure drugs on Earth. They have been FDA-approved for decades. Their safety profile is extremely well characterized. Several of them cross the blood-brain barrier in measurable amounts.&lt;/p&gt;

&lt;p&gt;There is already a small body of epidemiological literature suggesting that long-term ARB users have a lower risk of developing Parkinson's. There is animal-model work showing that ARBs are neuroprotective in MPTP and 6-OHDA mouse models of Parkinson's. There is a 2025 iPSC paper showing that pharmacological inhibition of AGTR1 is pro-survival in human dopamine neurons in a dish. None of this is brand new. What I think is new is the cleanly meta-analyzed cross-cohort confirmation that the cells that get lost in Parkinson's are exactly the cells that express the receptor those drugs hit.&lt;/p&gt;

&lt;p&gt;If a clinical trial were ever run — and I am not in any position to run one — the hypothesis would be: in early-stage Parkinson's patients, does adding a brain-penetrant ARB to standard care slow the rate of dopaminergic decline? It is the kind of trial that, on paper, costs maybe a few million dollars instead of the half-billion that a brand-new drug would cost, because the drugs already exist and are already off-patent.&lt;/p&gt;

&lt;p&gt;I am genuinely hopeful that someone with the credentials to do this picks it up.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I want to be very careful not to claim
&lt;/h2&gt;

&lt;p&gt;I am going to say this part plainly because I have watched too many people in adjacent fields ruin their credibility by overclaiming. So, what this analysis does not do:&lt;/p&gt;

&lt;p&gt;It does not prove ARBs treat Parkinson's. A retrospective bioinformatics finding is a hypothesis-generation step. The wet lab is where biology actually gets tested. Until somebody runs a real trial in real patients, this is a target that looks promising on paper, nothing more.&lt;/p&gt;

&lt;p&gt;It does not establish causality. AGTR1-positive neurons being depleted in Parkinson's brains tells us they are vulnerable. It does not tell us why. It could be that AGTR1 signaling itself drives their death, in which case blocking it should help. It could also be that something else is killing them and AGTR1 is just a marker of a particular cell type that happens to be vulnerable for some other reason entirely, in which case blocking AGTR1 might do nothing. Wet lab work is the only way to distinguish those two possibilities.&lt;/p&gt;

&lt;p&gt;It does not mean I am right and the field has missed something. The field has not missed this. Kamath saw it in 2022. Several groups have followed up. What might be missing — and where I think a citizen-analysis approach actually adds value — is the rigorous cross-cohort confirmation in a single place that anyone can reproduce. That is the thing I can contribute as a hobbyist. The hard biology is somebody else's job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I am publishing this
&lt;/h2&gt;

&lt;p&gt;A few reasons.&lt;/p&gt;

&lt;p&gt;Mainly, I think this is what open science should look like. The datasets are public. The code is public. The methods are standard. The result is checkable in an afternoon by anyone who downloads the repository and runs the shell script. If I made a mistake, I want someone to find it and tell me. If I did not make a mistake, I want the people who can move this forward — neuroscientists, immunologists, drug-discovery teams, clinical trialists — to know it is there.&lt;/p&gt;

&lt;p&gt;Also, honestly, because the more I read about Parkinson's the more I understand that it is not a far-off problem. About a million people in the United States are living with it right now. The diagnosis rate is rising. The treatments are decent for the early symptoms and bad for the long-term ones. The next breakthrough is not going to come from one person. It is going to come from a lot of small, well-aimed contributions stacking up. I would like this to be one of them.&lt;/p&gt;

&lt;p&gt;If you are a neuroscientist, a clinician, an immunologist, a drug-development researcher, or a Parkinson's advocate and any of this lines up with something you are working on — I would genuinely love to hear from you. The repository has a contact section.&lt;/p&gt;

&lt;p&gt;The repository: &lt;a href="https://github.com/nicedreamzapp/parkinsons-vulnerability-predictor" rel="noopener noreferrer"&gt;https://github.com/nicedreamzapp/parkinsons-vulnerability-predictor&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Companion video walking through the full meta-analysis in about a minute: &lt;a href="https://youtu.be/bC4hgeHS9cg" rel="noopener noreferrer"&gt;https://youtu.be/bC4hgeHS9cg&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I will keep working on this. The next pass is going to look at whether the AGTR1 effect tracks with disease severity within each cohort, and whether the same signal shows up in Lewy Body Dementia, which shares biology with Parkinson's. If anything interesting comes out, it will go up on the same repository, in the open, with the code attached.&lt;/p&gt;

&lt;p&gt;Thanks for reading.&lt;/p&gt;

&lt;p&gt;— Matt&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Companion video on YouTube: &lt;a href="https://youtu.be/bC4hgeHS9cg" rel="noopener noreferrer"&gt;https://youtu.be/bC4hgeHS9cg&lt;/a&gt;. Full reproducible repo: &lt;a href="https://github.com/nicedreamzapp/parkinsons-vulnerability-predictor" rel="noopener noreferrer"&gt;https://github.com/nicedreamzapp/parkinsons-vulnerability-predictor&lt;/a&gt;. I run &lt;a href="https://nicedreamzwholesale.com" rel="noopener noreferrer"&gt;Nice Dreamz LLC&lt;/a&gt; and consult on private/local AI for compliance-sensitive firms via &lt;a href="https://nicedreamzwholesale.com/airgap" rel="noopener noreferrer"&gt;AirGap AI&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>parkinsons</category>
      <category>neuroscience</category>
      <category>bioinformatics</category>
      <category>singlecell</category>
    </item>
    <item>
      <title>I Just Watched One Hacker Catch Up to a Trillion-Dollar Data Center</title>
      <dc:creator>Matt Macosko</dc:creator>
      <pubDate>Sun, 10 May 2026 05:16:13 +0000</pubDate>
      <link>https://dev.to/matt_macosko_f3829cfd86b8/i-just-watched-one-hacker-catch-up-to-a-trillion-dollar-data-center-404f</link>
      <guid>https://dev.to/matt_macosko_f3829cfd86b8/i-just-watched-one-hacker-catch-up-to-a-trillion-dollar-data-center-404f</guid>
      <description>&lt;p&gt;&lt;a href="https://youtu.be/7l8-s8xkpms" rel="noopener noreferrer"&gt;▶ Watch the companion video — three engines, one prompt, one MacBook&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Yesterday Salvatore Sanfilippo — the guy who wrote Redis 15 years ago and ran it solo for over a decade — published a few thousand lines of C code and quietly changed what counts as possible on a personal laptop.&lt;/p&gt;

&lt;p&gt;The project is called &lt;code&gt;ds4&lt;/code&gt;. It's a hand-written native inference engine, Metal kernels and all, built for one specific model: &lt;strong&gt;DeepSeek V4 Flash&lt;/strong&gt;. A 284-billion-parameter Mixture-of-Experts model with a &lt;strong&gt;1-million-token context window&lt;/strong&gt;. Until last week, that lived inside the kind of GPU clusters that bill more per hour than my truck.&lt;/p&gt;

&lt;p&gt;I'm running it on the laptop I'm typing this on.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I actually did
&lt;/h2&gt;

&lt;p&gt;Today I gave the same prompt to three different AI engines. The same prompt, on the same MacBook:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Build an animated northern lights scene in a single HTML file — mountains, pine trees, twinkling stars, and a flowing aurora."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Three engines:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek V4 Flash&lt;/strong&gt; running locally through &lt;code&gt;ds4&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Claude&lt;/strong&gt; through my Max plan, hitting Anthropic's data center&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemma 4 31B&lt;/strong&gt; running locally through MLX&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then I watched what came out.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Engine&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;th&gt;Hosted on&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4 Flash (ds4 local)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;103 s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3,259 tokens&lt;/td&gt;
&lt;td&gt;my laptop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud Claude (Max plan)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;192 s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~3,500 tokens&lt;/td&gt;
&lt;td&gt;Anthropic's GPUs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 31B (MLX local)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;131 s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,992 tokens&lt;/td&gt;
&lt;td&gt;my laptop&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Local-first DeepSeek beat cloud Claude on raw wall-clock time. Let that one sit for a second.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the auroras actually looked like
&lt;/h2&gt;

&lt;p&gt;Three completely different interpretations of the same prompt — which is the part that surprised me.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek's aurora&lt;/strong&gt; was a flowing ribbon of teal and lavender drifting over a dense pine forest. The trees got a subtle green underglow from the sky. Most cohesive of the three.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Claude's&lt;/strong&gt; went the most cinematic — magenta and teal aurora bands draped across jagged mountains, with a soft luminescent dusting along the peaks. It looks like a desktop wallpaper.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemma's&lt;/strong&gt; went minimalist — a single sweeping streak of green and violet against a starry sky, with a clean line-drawing mountain silhouette. Stylized, almost graphic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each one runs forever. None of them needed a network call.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this means for the cloud AI bill
&lt;/h2&gt;

&lt;p&gt;I've written before about how my Claude Max plan returns roughly 40x value compared to API rates. That math is still true. Anthropic is still subsidizing the difference.&lt;/p&gt;

&lt;p&gt;But this changes the calculation. If the cloud got expensive tomorrow — or if my data needed to stay on-device — I now have a credible escape hatch. Not "good enough for testing." Actually frontier-class. Actually long context. Actually integrated with Claude Code.&lt;/p&gt;

&lt;p&gt;The hybrid is what I'm settling into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;claude&lt;/code&gt; for the hardest reasoning tasks I do all day&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;claude-ds4&lt;/code&gt; for routine work, long-context document review, and anything I'd rather keep on this machine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The meter at the top of my screen no longer goes up for most of what I do.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three things &lt;code&gt;ds4&lt;/code&gt; does that nobody else has bundled
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Asymmetric 2-bit quantization.&lt;/strong&gt; Only the routed Mixture-of-Experts experts get compressed to 2-bit — about 90% of the weight footprint. Every quality-critical path (shared experts, attention projections, output head) stays at full precision. The result is an 81 GB file that calls tools cleanly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;KV cache moved to disk.&lt;/strong&gt; Modern Apple SSDs are fast enough that "the KV cache must live in RAM" is just no longer true in 2026. &lt;code&gt;ds4&lt;/code&gt; writes session state to disk and reuses it across restarts. The first 25k-token Claude Code system prompt gets prefilled exactly once, ever, and replays from cache after that.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pure Metal native code.&lt;/strong&gt; No PyTorch, no TensorFlow, no llama.cpp wrapper layer. The hot path is Metal compute kernels written for one specific 284B model. M3 Max gets ~27 tokens/sec; M5 Max hits ~32.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  What I'm doing next
&lt;/h2&gt;

&lt;p&gt;The voice agent stack I've been building — wake-word listening, voiceprint filtering, transcripts piped straight into Claude Code — has been waiting for exactly this. Cloud Claude was good enough to test with, but routing every utterance through someone else's API meant every long agent run had a meter on it.&lt;/p&gt;

&lt;p&gt;This week the meter goes away. Same agent, same tools, same cloned voice — running on a single laptop, off-cloud, with a context window five times longer than what I had on the Max plan.&lt;/p&gt;

&lt;p&gt;I'll write that one up next.&lt;/p&gt;

&lt;p&gt;For now, if you've got 128 GB of RAM and a free hour: clone &lt;code&gt;antirez/ds4&lt;/code&gt;, run &lt;code&gt;make&lt;/code&gt;, run &lt;code&gt;./download_model.sh q2&lt;/code&gt;, and see for yourself.&lt;/p&gt;

&lt;p&gt;May 9, 2026. The day a single C file caught up to the data centers.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Companion repo: &lt;a href="https://github.com/antirez/ds4" rel="noopener noreferrer"&gt;github.com/antirez/ds4&lt;/a&gt; · Hugging Face weights: &lt;code&gt;antirez/deepseek-v4-gguf&lt;/code&gt; · Model card: &lt;code&gt;deepseek-ai/DeepSeek-V4-Flash&lt;/code&gt; · Companion video: [YouTube link TBD]&lt;/em&gt;&lt;/p&gt;







&lt;p&gt;&lt;em&gt;Companion video on YouTube: &lt;a href="https://youtu.be/7l8-s8xkpms" rel="noopener noreferrer"&gt;https://youtu.be/7l8-s8xkpms&lt;/a&gt;. For local-AI consulting for compliance-sensitive firms (law, medical, finance), see &lt;a href="https://nicedreamzwholesale.com/airgap" rel="noopener noreferrer"&gt;AirGap AI&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>deepseek</category>
      <category>ds4</category>
      <category>antirez</category>
    </item>
    <item>
      <title>HumanEval on a MacBook — 81.7% pass@1, Wi-Fi off</title>
      <dc:creator>Matt Macosko</dc:creator>
      <pubDate>Wed, 29 Apr 2026 18:11:49 +0000</pubDate>
      <link>https://dev.to/matt_macosko_f3829cfd86b8/humaneval-on-a-macbook-817-pass1-wi-fi-off-22ap</link>
      <guid>https://dev.to/matt_macosko_f3829cfd86b8/humaneval-on-a-macbook-817-pass1-wi-fi-off-22ap</guid>
      <description>&lt;p&gt;The M5 Max MacBook Pro with 128 GB of unified memory is the first laptop that can hold a frontier-class coding agent entirely in RAM. No GPU rack. No cloud. No subscription.&lt;/p&gt;

&lt;p&gt;I just ran HumanEval on it. Wi-Fi off the entire run.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;81.7% pass@1&lt;/strong&gt; on the full 164-problem benchmark&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qwen 3 Coder 30B-A3B-Instruct&lt;/strong&gt; (8-bit MLX)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;14 minutes&lt;/strong&gt; wall-clock, &lt;strong&gt;$0/month&lt;/strong&gt; after the model download&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;YouTube walkthrough (three real problems, code streaming live, tests going green):&lt;br&gt;
&lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=muq7VdgxqRk" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=muq7VdgxqRk&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this number matters
&lt;/h2&gt;

&lt;p&gt;The Qwen team didn't publish HumanEval scores for any Qwen3-Coder variant — they consider the benchmark saturated and went straight to agentic ones (SWE-bench Verified, BFCL, Aider-Polyglot). For the 30B variant — the one that actually fits on a laptop — there were no published HumanEval/MBPP numbers. Until this run.&lt;/p&gt;

&lt;p&gt;I also ran &lt;strong&gt;MBPP (sanitized): 83.3% pass@1&lt;/strong&gt; on a 168-problem sample. Pass rate stable since n=120; full 427-run was impractical because a few outlier tasks induce very long model responses (10+ minutes each).&lt;/p&gt;

&lt;h2&gt;
  
  
  Methodology
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Benchmark&lt;/td&gt;
&lt;td&gt;HumanEval — 164 Python tasks (full)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metric&lt;/td&gt;
&lt;td&gt;pass@1 (first attempt only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Temperature&lt;/td&gt;
&lt;td&gt;0.0 — deterministic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sampling&lt;/td&gt;
&lt;td&gt;single sample per problem, no best-of-N&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execution&lt;/td&gt;
&lt;td&gt;Python subprocess, 10s timeout&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardware&lt;/td&gt;
&lt;td&gt;M5 Max MacBook Pro · 128 GB unified memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model&lt;/td&gt;
&lt;td&gt;Qwen3-Coder-30B-A3B-Instruct-MLX-8bit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network&lt;/td&gt;
&lt;td&gt;Wi-Fi &lt;strong&gt;OFF&lt;/strong&gt; the entire run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wall clock&lt;/td&gt;
&lt;td&gt;14 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  For context — Qwen3-Coder 480B's official agentic benchmarks
&lt;/h2&gt;

&lt;p&gt;The Qwen team's published numbers for the 480B flagship sibling (the bigger sibling of the 30B running on this MacBook):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Qwen3-Coder 480B&lt;/th&gt;
&lt;th&gt;Claude Sonnet 4&lt;/th&gt;
&lt;th&gt;GPT-4.1&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SWE-bench Verified (500-turn)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;69.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;70.4&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Terminal-Bench&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;37.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;35.5&lt;/td&gt;
&lt;td&gt;25.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BFCL-v3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;68.7&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;73.3&lt;/td&gt;
&lt;td&gt;62.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aider-Polyglot&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;61.8&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;56.4&lt;/td&gt;
&lt;td&gt;52.4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Source: &lt;a href="https://qwenlm.github.io/blog/qwen3-coder/" rel="noopener noreferrer"&gt;Qwen team's official blog&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the offline part matters
&lt;/h2&gt;

&lt;p&gt;If a tool needs the internet, three things are true:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Someone else can read what you sent.&lt;/li&gt;
&lt;li&gt;Someone else can charge you for it.&lt;/li&gt;
&lt;li&gt;Someone else can take it away.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the same tool runs locally, none of those are true. That's a different category of software — and for law firms, medical practices, and accountants handling client material, it's the only legal one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reproduce it yourself
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Open-source launchers: &lt;a href="https://github.com/nicedreamzapp/claude-code-local" rel="noopener noreferrer"&gt;github.com/nicedreamzapp/claude-code-local&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;HumanEval dataset: &lt;a href="https://github.com/openai/human-eval" rel="noopener noreferrer"&gt;github.com/openai/human-eval&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Hardware: any M-series MacBook with ≥32 GB RAM (128 GB Max preferred for full 8-bit weights)&lt;/li&gt;
&lt;li&gt;Total monthly cost: &lt;strong&gt;$0&lt;/strong&gt; after the model download&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For law firms, medical practices, and accountants who want help getting this stack running on their own hardware — that's what &lt;a href="https://nicedreamzwholesale.com/airgap" rel="noopener noreferrer"&gt;AirGap&lt;/a&gt; is. 14-day pilot, fixed scope, the data never leaves your machines.&lt;/p&gt;

&lt;p&gt;— matt&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://marijuanaunion.com" rel="noopener noreferrer"&gt;Marijuana Union&lt;/a&gt;. For premium vaporizers visit &lt;a href="https://ineedhemp.com" rel="noopener noreferrer"&gt;iNeedHemp&lt;/a&gt;, wholesale at &lt;a href="https://nicedreamzwholesale.com" rel="noopener noreferrer"&gt;Nice Dreamz&lt;/a&gt;, and seeds at &lt;a href="https://tribeseedbank.com" rel="noopener noreferrer"&gt;Tribe Seed Bank&lt;/a&gt;. Explore the 3D cannabis marketplace at &lt;a href="https://marijuanaunion.com/marketplace/" rel="noopener noreferrer"&gt;The Farmstand&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>benchmark</category>
      <category>macbook</category>
    </item>
    <item>
      <title>Pulling 10x My Subscription Value Out of Claude — While Quietly Building the Backup Plan</title>
      <dc:creator>Matt Macosko</dc:creator>
      <pubDate>Sun, 26 Apr 2026 18:06:41 +0000</pubDate>
      <link>https://dev.to/matt_macosko_f3829cfd86b8/pulling-10x-my-subscription-value-out-of-claude-while-quietly-building-the-backup-plan-168f</link>
      <guid>https://dev.to/matt_macosko_f3829cfd86b8/pulling-10x-my-subscription-value-out-of-claude-while-quietly-building-the-backup-plan-168f</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid8wcc2v7vox7rqltj91.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid8wcc2v7vox7rqltj91.png" alt="21 days of Claude Code: $2,976 of API value consumed for $100 paid (Claude Max 5x subscription, pro-rated to $3.33/day shown as baseline)" width="800" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The math, visualized: every blue bar is one day's API-equivalent token consumption. The green dashed line at the bottom is what I actually paid (pro-rated). April 14 alone — $454 in one day — was more than four months of subscription.&lt;/em&gt;&lt;/p&gt;







&lt;p&gt;Every Sunday night I watch the meter tick toward 100% again. That's been the rhythm for months — five days of heavy work, one day of cleanup, one day of waiting for the weekly reset. I'm on Claude Max — usually the 5x tier at $100 a month — and I burn through nearly every token they give me.&lt;/p&gt;

&lt;p&gt;Out of curiosity I ran the math last week. I'm not sure I should have.&lt;/p&gt;

&lt;p&gt;Over the last three weeks, the tokens I've put through Claude Code added up to about &lt;strong&gt;$2,976 worth of API usage&lt;/strong&gt; at Anthropic's published rates. Pro-rated, my subscription cost over that same window was about &lt;strong&gt;$70&lt;/strong&gt;. One Tuesday in mid-April, I spent &lt;strong&gt;$454 of token-equivalent value in a single day&lt;/strong&gt; — more than four months of subscription, in one sitting.&lt;/p&gt;

&lt;p&gt;The math doesn't quite work. Which is exactly what I want to talk about.&lt;/p&gt;




&lt;h2&gt;
  
  
  The honest comparison
&lt;/h2&gt;

&lt;p&gt;If I were paying per-token through the API to do the work I'm doing — coding, writing, agent loops, the whole pipeline — I'd be looking at roughly $1,000 a week at the rate I'm running. The 5x Max subscription pro-rates to about $23 a week.&lt;/p&gt;

&lt;p&gt;That's a ~40x multiplier. The headline says 10x because I wanted to be conservative. The real number is about four times that.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why is Anthropic eating the difference?
&lt;/h2&gt;

&lt;p&gt;This is the part I keep turning over. Anthropic isn't a charity — they have GPU bills, payroll, and investors who eventually want margins. So why hand power users $1,000 of compute for $23 a week?&lt;/p&gt;

&lt;p&gt;From where I sit, three reasons line up:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Habit formation beats price optimization.&lt;/strong&gt; A heavy API user shops on price every quarter. A subscriber builds workflows, custom agents, deep tool integrations — and wakes up two years later unable to leave. Subscriptions are sticky. API calls are mercenary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Predictable revenue is worth a discount.&lt;/strong&gt; Investors love subscription math. $100/mo × 100,000 power users is a number you can model. Lumpy API usage isn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Land-grab while the war is hot.&lt;/strong&gt; OpenAI, Google, and Anthropic are all fighting for the same developer seats right now. Subsidizing the heavy users means &lt;em&gt;those users&lt;/em&gt; go tell other developers "you have to try this." I've done it myself a dozen times. That word-of-mouth is cheaper than ads.&lt;/p&gt;

&lt;p&gt;So the subsidy is rational. It's also temporary.&lt;/p&gt;




&lt;h2&gt;
  
  
  The tells it's already tightening
&lt;/h2&gt;

&lt;p&gt;A year ago there were no weekly limits on Claude Pro or Max. The tier was straightforward: pay, use it. Now there's a weekly cap that resets on a rolling schedule, and Opus has its own quota separated from Sonnet.&lt;/p&gt;

&lt;p&gt;That's the rug being pulled tighter, one notch every few months. Anyone who's watched this pattern in cloud services or streaming knows where it ends — not with the deal disappearing, but with the deal getting expensive enough that you have to think before you use it. I'd rather adjust now than be surprised.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I built quietly on the side
&lt;/h2&gt;

&lt;p&gt;I love Claude Max. I'd be foolish not to. But I wouldn't bet my business on a subsidized line item with a tightening cap, so I built insurance.&lt;/p&gt;

&lt;p&gt;My primary machine is an M5 Max MacBook Pro with 128 GB of RAM. On it I run Gemma 4 31B as my daily local driver — fast, capable enough for editing and routine code work, and it doesn't phone home. The bigger picture is a three-node mesh: the M5 as workstation, a Mac Mini as workhorse, and a small VPS as the gateway. Voice cloning, content drafting, simple agent loops — those run locally now.&lt;/p&gt;

&lt;p&gt;They're not as smart as Opus. They don't need to be. The split: cloud Claude does the hard, frontier-grade thinking. Local models do the steady-state work — the kind that, if Anthropic doubled prices tomorrow, I wouldn't want to be paying premium rates for anyway. I'd rather pay once for hardware I own than be one pricing email away from a forced workflow change.&lt;/p&gt;




&lt;h2&gt;
  
  
  The honest takeaway
&lt;/h2&gt;

&lt;p&gt;Right now is one of the strangest pricing windows in software history. You can rent the most capable AI on the planet for less than a decent meal out and use it like a senior engineer who never sleeps. Use it hard, build the workflows, learn everything the subsidy will let you.&lt;/p&gt;

&lt;p&gt;The people still working at this pace in eighteen months won't be the ones who picked a side. They'll be the ones running both — getting maximum leverage from cloud Claude while making sure their work doesn't depend on it lasting.&lt;/p&gt;

&lt;p&gt;Forty times my money's worth. I'm grateful. I'm also paying attention.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Matt Macosko runs Divine Tribe and Nice Dreamz LLC out of Northern California. He writes about local AI, ambient computing, and what it actually takes to run a small business with frontier-grade tools.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://nicedreamzwholesale.com/2026/04/26/pulling-10x-my-subscription-value-out-of-claude-while-quietly-building-the-backup-plan/" rel="noopener noreferrer"&gt;Nice Dreamz Wholesale&lt;/a&gt;. For local-AI consulting for compliance-sensitive firms (law, medical, finance), see &lt;a href="https://nicedreamzwholesale.com/airgap" rel="noopener noreferrer"&gt;AirGap AI&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>localai</category>
      <category>ambientcomputing</category>
    </item>
    <item>
      <title>Free AI on a MacBook vs $100-a-Month Claude Code — Hexagon Shootout</title>
      <dc:creator>Matt Macosko</dc:creator>
      <pubDate>Thu, 23 Apr 2026 04:32:47 +0000</pubDate>
      <link>https://dev.to/matt_macosko_f3829cfd86b8/free-ai-on-a-macbook-vs-100-a-month-claude-code-hexagon-shootout-5h1o</link>
      <guid>https://dev.to/matt_macosko_f3829cfd86b8/free-ai-on-a-macbook-vs-100-a-month-claude-code-hexagon-shootout-5h1o</guid>
      <description>&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=2KeTDDodE0A" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kv4avef0epmb8pv5jbv.jpg" alt="FREE AI on a MacBook vs Claude Cloud — Hexagon Shootout" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;▶ Watch the race on YouTube:&lt;/strong&gt; &lt;a href="https://www.youtube.com/watch?v=2KeTDDodE0A" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=2KeTDDodE0A&lt;/a&gt;&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;April 22, 2026.&lt;/strong&gt; Anthropic's Claude Code Max plan jumped to $100 a month. I ran a live three-way AI race on the exact same prompt — Gemma 31B local, Llama 70B local, and Claude cloud — on a single MacBook, to see how close a free local stack gets to the paid cloud. Two of three contestants finished with zero cloud calls.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you just want the video, it's here: &lt;a href="https://www.youtube.com/watch?v=2KeTDDodE0A" rel="noopener noreferrer"&gt;FREE AI on a MacBook vs Claude Cloud — Hexagon Shootout&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want the repo, it's here: &lt;a href="https://github.com/nicedreamzapp/claude-code-local" rel="noopener noreferrer"&gt;github.com/nicedreamzapp/claude-code-local&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Keep reading for the setup, the numbers, and the three things that surprised me.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup — same prompt, three contestants
&lt;/h2&gt;

&lt;p&gt;Hardware: M5 Max MacBook Pro, 128 GB unified memory, Apple Silicon.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gemma 31B&lt;/strong&gt; — local, Apple MLX, 4-bit quantized (Google's code-specialized model)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Llama 70B&lt;/strong&gt; — local, Apple MLX, 8-bit quantized (Meta's generalist)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude cloud&lt;/strong&gt; — the real Anthropic API, using Claude Code unchanged&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same prompt to every contestant:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Build a single HTML file with inline JavaScript that shows a ball bouncing inside a rotating hexagon. Include gravity and realistic bounce physics.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Simple enough that the answer should be a few kilobytes of code. Interesting enough that it exposes how well a model handles real math — collision detection against rotating geometry, energy conservation, boundary clamping. When models trip, they trip here.&lt;/p&gt;

&lt;p&gt;Every run was recorded end-to-end with a live stats panel: elapsed seconds, output bytes, tokens-per-second. No cherry-picking, no post-hoc edits to the physics code, no "here's what it SHOULD have said." What you see is what came out.&lt;/p&gt;

&lt;h2&gt;
  
  
  The results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Contestant&lt;/th&gt;
&lt;th&gt;Time to ship working HTML&lt;/th&gt;
&lt;th&gt;Tokens/sec&lt;/th&gt;
&lt;th&gt;Cloud calls&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude cloud&lt;/td&gt;
&lt;td&gt;22 s&lt;/td&gt;
&lt;td&gt;N/A (data center)&lt;/td&gt;
&lt;td&gt;yes (via API)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 31B local&lt;/td&gt;
&lt;td&gt;56 s&lt;/td&gt;
&lt;td&gt;~30&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;zero&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 70B local&lt;/td&gt;
&lt;td&gt;2:17&lt;/td&gt;
&lt;td&gt;~11&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;zero&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Claude cloud finished first — it's a data center somewhere. Gemma 31B finished clean in under a minute with working physics. Llama 70B took the longest and produced the most verbose output, but also landed a working demo in the end.&lt;/p&gt;

&lt;p&gt;The headline isn't that one is "best." It's that two of the three ran with Wi-Fi that could have been off the entire time. That's the number that matters for anyone dealing with NDAs, PHI, client files, or just a flight without connectivity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three things that surprised me
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Bigger isn't better when "bigger" is a generalist
&lt;/h3&gt;

&lt;p&gt;I went in expecting Llama 70B to beat Gemma 31B on code quality. It's more than twice the parameter count. Gemma beat Llama cleaner and faster on this specific task.&lt;/p&gt;

&lt;p&gt;Why: Gemma 4 is a Google model fine-tuned heavily for coding and math. Llama 3.3 70B is Meta's generalist — it's excellent at conversation, reasoning, creative writing, but it wasn't tuned to punch above its weight on HTML canvas physics.&lt;/p&gt;

&lt;p&gt;If you're buying a local model for coding, you're better off with a 30B that's code-tuned than a 70B that's general. Don't count parameters, read the model card.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Claude Code's harness chokes local models
&lt;/h3&gt;

&lt;p&gt;Claude Code (the CLI agent) sends a 29,000-token system prompt with 60 tool schemas in every request. That's tuned for the cloud — where a frontier model can happily chew through 30K tokens of context before even starting. On a local 70B, that prefill takes a minute or two before generation begins.&lt;/p&gt;

&lt;p&gt;When I bypassed Claude Code and hit the MLX server directly with just the prompt, Llama 70B's wall-clock time dropped from 7+ minutes to under 2.&lt;/p&gt;

&lt;p&gt;The tradeoff: without Claude Code's harness you lose the Write/Edit/Bash tool-use loop, so you can't use Claude Code as an agent, only as a generator. For research, benchmarking, or any single-shot prompt, direct is way faster. For actual coding sessions, the overhead is real but it's what buys you the agent loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Circle-approximation collision is the cheat code
&lt;/h3&gt;

&lt;p&gt;All three models eventually produced a bouncing ball. The ones that worked used &lt;strong&gt;circle-approximation collision&lt;/strong&gt; — treat the hexagon as a circle of its apothem radius for collision purposes, reflect velocity when the ball exceeds that radius, clamp the ball back to exactly inside. Five lines of math, reliable, hexagon can rotate as wildly as you want.&lt;/p&gt;

&lt;p&gt;The ones that failed tried to do proper polygon-edge collision — compute the six edges of the rotating hexagon each frame, compute point-to-line distance for each, reflect off the appropriate edge. That's the "right" way, and it fails constantly because floating-point error lets the ball slip through edges during the rotation, and then the model doesn't know how to clamp it back.&lt;/p&gt;

&lt;p&gt;I wouldn't have predicted this. The "simple" approximation is strictly better for the demo because it can't leak. For anything more complex than one ball, the polygon approach is necessary — but for a benchmark, approximation wins.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who should care
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Developers&lt;/strong&gt; on laptops with 64+ GB of Apple Silicon unified memory: you can run this today, your hardware already supports it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anyone dealing with confidential work&lt;/strong&gt; — lawyers, accountants, doctors, contractors handling NDAs or PHI: the cost isn't $0 vs $100, it's "does your data leave the machine" vs "does it not."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frequent flyers and people who travel to places with bad internet&lt;/strong&gt;: a 70B model on a laptop keeps working when the plane's Wi-Fi is $18 and throttled.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anyone curious whether Apple's bet on unified memory was actually about AI&lt;/strong&gt;: it was.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to run it yourself
&lt;/h2&gt;

&lt;p&gt;The repo is MIT licensed and open source. Full setup is in the README:&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;&lt;a href="https://github.com/nicedreamzapp/claude-code-local" rel="noopener noreferrer"&gt;github.com/nicedreamzapp/claude-code-local&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The project pairs a native-MLX Anthropic-API-compatible server with Claude Code. Point Claude Code at &lt;code&gt;localhost:4000&lt;/code&gt; and the official CLI talks to your local model as if it were the cloud API. Swap models with one env var. Ship code without the subscription.&lt;/p&gt;

&lt;p&gt;Around 2,000 stars in the first month. If it's useful, a star helps.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Claude cloud: $100/mo, 22 seconds to a working hexagon.&lt;/li&gt;
&lt;li&gt;Gemma 31B on my MacBook: $0, 56 seconds to a working hexagon.&lt;/li&gt;
&lt;li&gt;Llama 70B on my MacBook: $0, 2:17 to a working hexagon.&lt;/li&gt;
&lt;li&gt;Two of three ran with zero cloud calls.&lt;/li&gt;
&lt;li&gt;Free AI on Apple Silicon is real, now, for a huge slice of what people use cloud APIs for.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The receipts, in video form: &lt;a href="https://www.youtube.com/watch?v=2KeTDDodE0A" rel="noopener noreferrer"&gt;youtube.com/watch?v=2KeTDDodE0A&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://marijuanaunion.com" rel="noopener noreferrer"&gt;Marijuana Union&lt;/a&gt;. For premium vaporizers visit &lt;a href="https://ineedhemp.com" rel="noopener noreferrer"&gt;iNeedHemp&lt;/a&gt;, wholesale at &lt;a href="https://nicedreamzwholesale.com" rel="noopener noreferrer"&gt;Nice Dreamz&lt;/a&gt;, and seeds at &lt;a href="https://tribeseedbank.com" rel="noopener noreferrer"&gt;Tribe Seed Bank&lt;/a&gt;. Explore the 3D cannabis marketplace at &lt;a href="https://marijuanaunion.com/marketplace/" rel="noopener noreferrer"&gt;The Farmstand&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>localllama</category>
      <category>mlx</category>
      <category>applesilicon</category>
    </item>
    <item>
      <title>The Era of Hunched-Over-A-Screen Computing Is Ending — Heres Whats Replacing It</title>
      <dc:creator>Matt Macosko</dc:creator>
      <pubDate>Wed, 22 Apr 2026 09:54:00 +0000</pubDate>
      <link>https://dev.to/matt_macosko_f3829cfd86b8/the-era-of-hunched-over-a-screen-computing-is-ending-heres-whats-replacing-it-41go</link>
      <guid>https://dev.to/matt_macosko_f3829cfd86b8/the-era-of-hunched-over-a-screen-computing-is-ending-heres-whats-replacing-it-41go</guid>
      <description>&lt;p&gt;Look around any coffee shop, any office, any living room. Everyone is bent forward at the same angle, staring into a glowing rectangle, with one hand on a small slab and the other on a bigger slab. The whole posture is wrong. We know it’s wrong — that’s why ergonomic chairs are a $2 billion industry — but we keep doing it because the computers we built require it.&lt;/p&gt;

&lt;p&gt;I think we’re at the end of that era. Not because somebody invented a magic new screen. Because computing itself is finally able to leave the rectangle.&lt;/p&gt;

&lt;p&gt;I call what’s coming &lt;strong&gt;ambient computing&lt;/strong&gt;. The phrase isn’t new, but most uses of it are about smart speakers or watches — small devices that ask you to look at them too. That’s not what I mean. I mean a way of working with computers that doesn’t require you to face a screen at all. Where the machine listens, talks back, sees what you see, and the keyboard becomes optional rather than mandatory.&lt;/p&gt;

&lt;p&gt;The pieces of it are already shipping. They just haven’t been assembled.&lt;/p&gt;

&lt;h2&gt;
  
  
  What ambient computing actually looks like
&lt;/h2&gt;

&lt;p&gt;Sitting in a hot tub a few weeks ago, I sent a text from my phone: &lt;em&gt;“find me the best rated electric guitar at this price range, screenshot it, and text it back to me.”&lt;/em&gt; Two minutes later my phone buzzed with the screenshot. The Mac on my desk had searched, found, captured, and sent back, while I stayed in the tub.&lt;/p&gt;

&lt;p&gt;That’s an ambient-computing moment. No screen. No keyboard. The computer was a participant in what I was doing rather than the thing I had to stop and walk over to.&lt;/p&gt;

&lt;p&gt;The same week, I had a hands-free coding session — speaking into the room, hearing a cloned version of my own voice narrate what the AI was doing, course-correcting verbally. No mouse. No keyboard. No screen-watching. The work got done. The AI told me when it was done. I went on with my day.&lt;/p&gt;

&lt;p&gt;Both of these worked on &lt;strong&gt;hardware I already owned&lt;/strong&gt;. A MacBook Pro on the desk. An iPhone in my pocket. The pieces that turned them into an ambient system are open source and free.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three pieces that already exist
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Local AI.&lt;/strong&gt; A current MacBook Pro can run a 70-billion-parameter language model entirely on the GPU side of its unified memory. That model is good enough to write code, draft documents, summarize content, and run multi-step tool-using workflows. It does this with no internet and no subscription. The model lives on the machine; the inference happens on the machine.&lt;/p&gt;

&lt;p&gt;The fact that this is true on consumer hardware is a recent development. It wasn’t true two years ago. And it’s the foundation of everything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. On-device speech.&lt;/strong&gt; Apple’s &lt;code&gt;SFSpeechRecognizer&lt;/code&gt; — the same engine that powers the dictation feature in macOS — runs entirely on your Mac. You can wrap it in a continuous-listening daemon and have it transcribe everything you say into a target window, no cloud round-trip. Pair it with a local TTS engine running a cloned version of your own voice (the cloning runs on the Mac too) and you have full speech in, full speech out, neither end touching a network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Phone-as-remote.&lt;/strong&gt; iMessage on a Mac can be driven by AppleScript. That means anything your Mac can do — search, code, browse, compose — can be triggered by a text from your phone. The phone becomes a remote for the more powerful machine, and the more powerful machine handles the heavy lift while you’re somewhere else.&lt;/p&gt;

&lt;p&gt;Stack those three together and you have a workflow where:&lt;br&gt;&lt;br&gt;
– You can ask the Mac to do something while you’re nowhere near it.&lt;br&gt;&lt;br&gt;
– You can hold a spoken conversation with it without typing or looking.&lt;br&gt;&lt;br&gt;
– It can produce real work — code, documents, research, video — and deliver it back to wherever you are.&lt;/p&gt;

&lt;p&gt;That’s ambient computing. Not Siri. Not Alexa. The full deal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters now
&lt;/h2&gt;

&lt;p&gt;Two arguments. The boring one: the &lt;strong&gt;bodily cost&lt;/strong&gt; of screen-and-keyboard computing is real and accumulating. Carpal tunnel, posture damage, eye strain, the chair-and-desk economy that exists to patch over the damage we’re doing to ourselves. We’ve been pretending this is fine for thirty years. It’s not.&lt;/p&gt;

&lt;p&gt;The interesting one: ambient computing is what makes a different &lt;em&gt;relationship&lt;/em&gt; with the machine possible. When the computer is something you face for eight hours a day, it occupies a specific role in your life — interrupt-driven, attention-stealing, mostly adversarial to whatever else you wanted to be doing. When the computer is something you talk to in passing, hand things off to, and check back on later, it occupies a completely different role. It becomes a colleague rather than a chore.&lt;/p&gt;

&lt;p&gt;We’re not going to fully arrive there in 2026. But the building blocks are shipping in 2026, and the people who set them up now will look up in two years and realize their working life feels different.&lt;/p&gt;

&lt;h2&gt;
  
  
  The catch
&lt;/h2&gt;

&lt;p&gt;For now, all of this requires being &lt;strong&gt;on a Mac&lt;/strong&gt;. Specifically, an Apple Silicon Mac with enough unified memory to run a real model — practically, that means an M2 Max / M3 Max / M4 Pro / M5 Max with 32 GB minimum, 64 GB+ for the bigger models. That’s an expensive piece of hardware.&lt;/p&gt;

&lt;p&gt;But it’s a piece of hardware most professionals already own, or could justify. And it’s the only piece you need. There’s no recurring AI subscription. No hosting bill. No phone-home telemetry that compromises the whole privacy story.&lt;/p&gt;

&lt;p&gt;The gear that gets you into ambient computing is gear you might already have. You just haven’t connected the pieces yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’m building toward
&lt;/h2&gt;

&lt;p&gt;The longer arc, for me, is robotics. Specifically a Lego-like modular system where you clip together small parts to build whatever the moment needs — a robot arm, a camera mount, a wheeled base — all driven by the same local AI vision system that runs everywhere else in the stack. That’s a few years out.&lt;/p&gt;

&lt;p&gt;In the meantime, I’m shipping the parts of the system that work today. The local-AI server is open source (&lt;a href="https://github.com/nicedreamzapp/claude-code-local" rel="noopener noreferrer"&gt;claude-code-local&lt;/a&gt;). The voice loop is open source (&lt;a href="https://github.com/nicedreamzapp/NarrateClaude" rel="noopener noreferrer"&gt;NarrateClaude&lt;/a&gt;). The browser agent is open source (&lt;a href="https://github.com/nicedreamzapp/browser-agent" rel="noopener noreferrer"&gt;browser-agent&lt;/a&gt;). The phone bridge is open source. The iPhone object-detection app that’s part of the same vision is on the App Store (&lt;a href="https://apps.apple.com/us/app/realtime-ai-cam/id6751230739" rel="noopener noreferrer"&gt;RealTime AI Cam&lt;/a&gt;) for free.&lt;/p&gt;

&lt;p&gt;If any of this resonates — if you’ve been quietly tired of being chained to a screen, or you can feel the future being built but haven’t been able to put your finger on what it is — clone something, run it, and tell me what you find. Most of the work ahead is figuring out which pieces fit where, and that’s not work I can do alone.&lt;/p&gt;

&lt;p&gt;The era of hunched-over-a-screen is ending. The next era is being built in the open, on commodity hardware, by people who decided to stop waiting for someone else to do it.&lt;/p&gt;

&lt;p&gt;— Matt&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of the &lt;a href="https://nicedreamzwholesale.com/software/" rel="noopener noreferrer"&gt;Nice Dreamz&lt;/a&gt; lineup. If you want this set up inside a firm or practice — private, on-device, no cloud — that’s &lt;a href="https://nicedreamzwholesale.com/airgap/" rel="noopener noreferrer"&gt;AirGap AI&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://marijuanaunion.com" rel="noopener noreferrer"&gt;Marijuana Union&lt;/a&gt;. For premium vaporizers visit &lt;a href="https://ineedhemp.com" rel="noopener noreferrer"&gt;iNeedHemp&lt;/a&gt;, wholesale at &lt;a href="https://nicedreamzwholesale.com" rel="noopener noreferrer"&gt;Nice Dreamz&lt;/a&gt;, and seeds at &lt;a href="https://tribeseedbank.com" rel="noopener noreferrer"&gt;Tribe Seed Bank&lt;/a&gt;. Explore the 3D cannabis marketplace at &lt;a href="https://marijuanaunion.com/marketplace/" rel="noopener noreferrer"&gt;The Farmstand&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>software</category>
      <category>ambientcomputingloca</category>
    </item>
    <item>
      <title>What Its Actually Like to Code By Voice — With the AI Replying In My Own Cloned Voice</title>
      <dc:creator>Matt Macosko</dc:creator>
      <pubDate>Wed, 22 Apr 2026 09:53:21 +0000</pubDate>
      <link>https://dev.to/matt_macosko_f3829cfd86b8/what-its-actually-like-to-code-by-voice-with-the-ai-replying-in-my-own-cloned-voice-32mc</link>
      <guid>https://dev.to/matt_macosko_f3829cfd86b8/what-its-actually-like-to-code-by-voice-with-the-ai-replying-in-my-own-cloned-voice-32mc</guid>
      <description>&lt;p&gt;The closest analogy I can give for what this feels like is having a quiet co-worker in the room who happens to sound exactly like you. You think out loud. They respond out loud. You both work on the same code. Neither of you is touching a keyboard.&lt;/p&gt;

&lt;p&gt;It’s still a little uncanny. But it’s also the most natural way to work I’ve found in twenty-plus years of writing software.&lt;/p&gt;

&lt;p&gt;The setup runs entirely on my MacBook. Apple’s on-device speech recognition listens for me. A local language model thinks. A cloned-voice text-to-speech says the response back. Nothing leaves the laptop. Nothing requires a network. The whole loop is on-device, and that turns out to matter for reasons I didn’t expect.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it actually works
&lt;/h2&gt;

&lt;p&gt;A compiled Swift binary wraps Apple’s &lt;code&gt;SFSpeechRecognizer&lt;/code&gt; — the same engine that powers macOS dictation — in a continuous-listening daemon. It transcribes everything I say into the active terminal window where Claude Code is running. End-of-utterance is detected by a stability heuristic: if the recognized text stops changing for about 2.5 seconds, the recognizer treats that sentence as final and submits it.&lt;/p&gt;

&lt;p&gt;That submission gets injected into Claude Code via AppleScript, addressed to a specific window by ID so it can’t leak into whatever else is open. Claude Code processes the request against a local language model running on MLX (Apple’s native ML framework). The response comes back as text in the terminal — and a separate launcher pipes that text into a TTS engine running a cloned version of my voice. The reply plays through the speakers. The listener auto-pauses while audio is playing, so the model’s spoken reply never gets picked up as a new prompt. Then it resumes listening for the next thing I say.&lt;/p&gt;

&lt;p&gt;End-to-end latency, on a current MacBook, is around two to three seconds. Fast enough to feel like a conversation. Slow enough that you notice it’s a different kind of pacing than typing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What surprises you the first hour
&lt;/h2&gt;

&lt;p&gt;The first thing that surprised me is how much I &lt;strong&gt;already&lt;/strong&gt; narrate while coding. The interior monologue — &lt;em&gt;“okay let me look at the test, that’s failing because the path is wrong, let me grep for the constant, oh it’s in a different file, fix that here…”&lt;/em&gt; — turns out to be most of how I work anyway. Speaking it out loud changed nothing about my reasoning. It just routed it to a different output channel.&lt;/p&gt;

&lt;p&gt;The second thing that surprised me is &lt;strong&gt;how much faster context-switching gets&lt;/strong&gt;. When you type, you have to break to compose. When you speak, you can just keep going. &lt;em&gt;“That’s done — now check the function signature in the parent class — yeah okay update the docstring to match — git status — looks good, commit it.”&lt;/em&gt; Five tasks, no pause, no posture change.&lt;/p&gt;

&lt;p&gt;The third surprise is the &lt;strong&gt;physical&lt;/strong&gt; difference. After half a day of voice-driven work I’m not stiff. My eyes aren’t tired. I haven’t held a clamshell wrist position for hours. There’s a real bodily cost to the way we normally use computers, and removing it feels like removing a weight you didn’t know was there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why on-device matters specifically for this
&lt;/h2&gt;

&lt;p&gt;You can build a voice-driven coding setup with cloud APIs. Whisper for speech-in, ElevenLabs for speech-out, GPT or Claude for the brain. Many people do. The result is a tool that works great until your wifi gets weird, your API key hits a rate limit, your monthly bill arrives, or you realize you’ve just sent every word you said in front of your laptop today to three different vendors’ servers.&lt;/p&gt;

&lt;p&gt;The on-device version doesn’t have any of those failure modes. It works on a plane. It works in a Faraday cage. It works when the rest of the internet is on fire. The bill is a one-time hardware purchase, not a perpetual subscription. And nothing — no audio, no text, no inference request — ever crosses the network. For me that’s the difference between an interesting demo and a tool I actually use day-to-day.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cloned voice is a real thing, not a gimmick
&lt;/h2&gt;

&lt;p&gt;The cloned voice is the part everyone reacts to first. When the AI reads its response in your own voice, your nervous system files it under “internal monologue” rather than “external announcement.” It’s a smoother experience than a stranger’s TTS voice and it doesn’t pull your attention the same way.&lt;/p&gt;

&lt;p&gt;But it works because the cloning is also on-device. The voice clone trains and runs locally — Pocket TTS in my case, but other local TTS engines slot in if you have a preference. Cloud voice services would mean my own voice (and everything I make it say) is sitting on someone’s server. Not interested.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it falls short today
&lt;/h2&gt;

&lt;p&gt;Three real limitations:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Domain-specific vocabulary.&lt;/strong&gt; Apple’s recognizer is excellent at general English, less excellent at obscure software terms, library names, and acronyms. &lt;em&gt;“Refactor the YOLOv8 inference loop”&lt;/em&gt; often comes through as &lt;em&gt;“refactor the yellow vate inference loop.”&lt;/em&gt; The fix is a custom vocabulary file you can register with the recognizer; that closes most of the gap but takes setup time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Background noise.&lt;/strong&gt; A quiet office is fine. A coffee shop is workable. A hot tub with the jets running, surprisingly, also fine. A room with kids and a dog is harder. The continuous-listen mode is robust but not magic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Long pauses.&lt;/strong&gt; If you stop talking to think for 20 seconds, the recognizer will sometimes finalize a partial sentence that wasn’t done yet, and you have to restart it. Workable but a real friction point I’m still iterating on.&lt;/p&gt;

&lt;p&gt;None of these are fundamental. All of them get better as the recognition models get better, which they’re doing every macOS release.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this unlocks for me personally
&lt;/h2&gt;

&lt;p&gt;Coding while pacing the room. Coding while cooking. Coding while in the hot tub. (Yes, really. The mac is on a desk; my voice carries; I check back in by walking up to the screen when something needs visual confirmation.) Holding voice work sessions that last for hours without my body breaking down.&lt;/p&gt;

&lt;p&gt;It also lets me work in spaces that aren’t desks. Most of my best thinking happens away from a screen anyway — the keyboard part was always the bottleneck. Removing it doesn’t make me think differently. It makes the time I spend actually capturing the thinking more honest about how that thinking happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you want to try it
&lt;/h2&gt;

&lt;p&gt;The local-AI server is at &lt;a href="https://github.com/nicedreamzapp/claude-code-local" rel="noopener noreferrer"&gt;github.com/nicedreamzapp/claude-code-local&lt;/a&gt;. The voice listener and dispatcher are at &lt;a href="https://github.com/nicedreamzapp/NarrateClaude" rel="noopener noreferrer"&gt;github.com/nicedreamzapp/NarrateClaude&lt;/a&gt;. Both are MIT-licensed, both run entirely on a MacBook with Apple Silicon, and both ship with double-click launchers so the install is closer to “set up an app” than “build a system.”&lt;/p&gt;

&lt;p&gt;You will spend an evening getting it tuned to your voice and your vocabulary. After that, it just works. And the working life it produces is, in my experience, qualitatively different from screen-and-keyboard.&lt;/p&gt;

&lt;p&gt;— Matt&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of the &lt;a href="https://nicedreamzwholesale.com/software/" rel="noopener noreferrer"&gt;Nice Dreamz&lt;/a&gt; lineup. If you’re a firm exploring private on-device AI, &lt;a href="https://nicedreamzwholesale.com/airgap/" rel="noopener noreferrer"&gt;AirGap AI&lt;/a&gt; is the engagement I do for setting this up inside law / medical / accounting practices.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://marijuanaunion.com" rel="noopener noreferrer"&gt;Marijuana Union&lt;/a&gt;. For premium vaporizers visit &lt;a href="https://ineedhemp.com" rel="noopener noreferrer"&gt;iNeedHemp&lt;/a&gt;, wholesale at &lt;a href="https://nicedreamzwholesale.com" rel="noopener noreferrer"&gt;Nice Dreamz&lt;/a&gt;, and seeds at &lt;a href="https://tribeseedbank.com" rel="noopener noreferrer"&gt;Tribe Seed Bank&lt;/a&gt;. Explore the 3D cannabis marketplace at &lt;a href="https://marijuanaunion.com/marketplace/" rel="noopener noreferrer"&gt;The Farmstand&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>software</category>
      <category>voicecodingwhisperco</category>
    </item>
    <item>
      <title>A Field Guide to Ambient Computing — The Words for the Thing Thats Coming</title>
      <dc:creator>Matt Macosko</dc:creator>
      <pubDate>Wed, 22 Apr 2026 09:53:15 +0000</pubDate>
      <link>https://dev.to/matt_macosko_f3829cfd86b8/a-field-guide-to-ambient-computing-the-words-for-the-thing-thats-coming-4ngp</link>
      <guid>https://dev.to/matt_macosko_f3829cfd86b8/a-field-guide-to-ambient-computing-the-words-for-the-thing-thats-coming-4ngp</guid>
      <description>&lt;p&gt;I’ve been building toward something I don’t have a clean name for yet. The pieces — local AI, on-device speech, the phone-as-remote, hands-free voice loops — exist and ship today. The pattern they form, when you put them together, doesn’t have a settled vocabulary. So I’m coining one. Not because I want to invent jargon, but because &lt;strong&gt;a thing without a name doesn’t exist for most people&lt;/strong&gt;, and this thing is going to exist for everyone within a few years.&lt;/p&gt;

&lt;p&gt;Here’s the working glossary. Use these terms. Argue with them. Replace them with better ones if you can think of any. The goal is to give shape to a way of working with computers that’s already real but not yet common.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Rectangle
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Noun.&lt;/strong&gt; The screen. Short for &lt;em&gt;the glowing rectangle of damage&lt;/em&gt;. The thing you’re hunched in front of right now. The thing that defines posture, attention, work pace, and back pain for most people who use computers professionally.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“I want to spend less of my day inside the Rectangle.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The Rectangle is not bad. It’s just been the only option for so long that people forgot it was an option at all rather than the default state of computing. Once you have a working alternative — and you do now — the Rectangle stops being a given. It becomes one of several places work can happen.&lt;/p&gt;




&lt;h2&gt;
  
  
  Off-Screen Work
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Noun.&lt;/strong&gt; Productive computing done without facing a screen. The opposite of screen work, not the absence of work. Hands-free voice coding is off-screen work. Texting your Mac from the hot tub and getting back a finished research summary is off-screen work. Listening to your AI narrate a long task in your own cloned voice while you walk around your house is off-screen work.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Half my morning was off-screen. I shipped more than I usually do at a desk.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Ambient Computing
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Noun.&lt;/strong&gt; Computing that happens &lt;em&gt;around&lt;/em&gt; you instead of &lt;em&gt;in front of&lt;/em&gt; you. The machine listens, talks back, sees what you point a camera at, and the keyboard becomes optional rather than mandatory. Ambient computing isn’t smart speakers. Smart speakers ask you to talk to a brand. Ambient computing is your own machine, doing your own work, in your own voice, in the room with you.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“I’m building toward ambient computing — a stack you can talk to, hand things off to, and check back on later.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  To Airgap (verb)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Transitive verb.&lt;/strong&gt; To configure an AI workflow such that it runs entirely on local hardware with no outbound network traffic — making client data, prompts, and responses physically incapable of leaving the machine.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“We airgapped the firm’s drafting workflow last week. Nothing they paste into the AI hits the internet.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the verb form of &lt;a href="https://nicedreamzwholesale.com/airgap/" rel="noopener noreferrer"&gt;AirGap AI&lt;/a&gt;, the consulting practice I run for firms in regulated industries. It’s also the right word for what you do when you set up your own local-AI stack on a MacBook and turn the wifi off to prove it works. Both of those count.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hand-Off Computing
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Noun.&lt;/strong&gt; A workflow where you give the computer a task, walk away, and it tells you when it’s done. Distinct from interactive computing, where you sit and wait, and from background computing, which you forget about until it crashes. In hand-off computing the machine knows you walked away, finishes the work, and notifies you back through whatever channel you set up — usually a text to your phone.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“I just hand-off it the data analysis and go make breakfast. It buzzes my phone when the report’s ready.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Two-Slab Posture
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Noun.&lt;/strong&gt; The body shape you assume when working with a laptop and a phone simultaneously — head tilted forward, shoulders rounded, both hands pulled in toward the body. The dominant posture of professional computing in 2026, and the source of much of the chronic pain that office workers attribute to “stress.” The Two-Slab Posture is what off-screen work makes optional.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“By 4pm every day I’m locked in the Two-Slab Posture and I can feel it in my neck.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Backpack Supercomputer
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Noun.&lt;/strong&gt; A current-generation laptop with enough on-device compute to run frontier-class AI models locally. Specifically: an Apple Silicon MacBook Pro with 64+ GB of unified memory, of the M2 / M3 / M4 / M5 Max generation. The phrase emphasizes that this hardware fits in a backpack while delivering performance that would have required a server rack five years ago.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“My M5 Max is a backpack supercomputer. I take it to client offices and it runs a 70-billion-parameter model on the train ride.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Ambient Stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Noun.&lt;/strong&gt; The minimum set of pieces required to assemble an ambient-computing setup. As of 2026, on Apple hardware:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A &lt;strong&gt;local LLM&lt;/strong&gt; running through an MLX-native server. (&lt;a href="https://github.com/nicedreamzapp/claude-code-local" rel="noopener noreferrer"&gt;claude-code-local&lt;/a&gt;.)&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;continuous-listen daemon&lt;/strong&gt; wrapping &lt;code&gt;SFSpeechRecognizer&lt;/code&gt;. (&lt;a href="https://github.com/nicedreamzapp/NarrateClaude" rel="noopener noreferrer"&gt;NarrateClaude&lt;/a&gt;.)&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;cloned-voice TTS&lt;/strong&gt; for spoken responses in the user’s own voice.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;phone bridge&lt;/strong&gt; so iMessage can drive the Mac. (Custom AppleScript.)&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;local browser agent&lt;/strong&gt; for web tasks. (&lt;a href="https://github.com/nicedreamzapp/browser-agent" rel="noopener noreferrer"&gt;browser-agent&lt;/a&gt;.)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Five components. All open source. All run on hardware you may already own. The entire stack costs $0 in recurring fees once installed.&lt;/p&gt;




&lt;h2&gt;
  
  
  To Whisper-Code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Verb.&lt;/strong&gt; To do programming work via voice, with the AI replying in the developer’s own cloned voice. Distinct from voice dictation (which still requires you to be at the screen to read the result). Whisper-coding is an end-to-end conversation about code, in audio, where the developer never has to look at the screen unless they choose to verify something visually.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“I whisper-coded the fix while pacing the kitchen. Saw the diff after lunch and it was right.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Cloned-Voice Loop
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Noun.&lt;/strong&gt; A feedback loop where the AI’s spoken responses are rendered in a TTS clone of the user’s own voice. This makes the response feel less like an external announcement and more like internal monologue, which the human nervous system processes more naturally and at lower cognitive load than a stranger’s voice. The loop runs on-device for the same privacy reasons as the rest of the ambient stack.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“After a week with the cloned-voice loop, hearing a stranger TTS feels jarring.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Hot-Tub Coding
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Noun.&lt;/strong&gt; Sending a coding or research task to your Mac from your phone while not being at the Mac. Originally literal — sending the task while in a hot tub — now a generic term for any phone-driven hand-off computing session. The hallmark of hot-tub coding is that the &lt;em&gt;human&lt;/em&gt; is doing something else entirely while the &lt;em&gt;computer&lt;/em&gt; is doing the work the human ordered.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“That whole feature was hot-tub coded. I never sat down to write any of it.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Local-First AI
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Adjective phrase.&lt;/strong&gt; A system architecture in which AI inference defaults to running on the user’s own device, with cloud as a fallback used only when local can’t handle the task — &lt;em&gt;not&lt;/em&gt; the other way around. The cultural and technical opposite of &lt;em&gt;cloud-first AI&lt;/em&gt;, which has been the default since 2019. Local-first AI is the architecture every privacy-sensitive industry is now going to need, whether they realize it yet or not.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“For NDA work, the only sane architecture is local-first AI.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  A Note On Ownership
&lt;/h2&gt;

&lt;p&gt;These words don’t belong to me. If you find them useful, use them. If you build on the stack and coin a better word for something I’ve named here, I’ll switch to using yours. The point of giving things names is to make them discussable, not to lock down a vocabulary.&lt;/p&gt;

&lt;p&gt;But I do want the &lt;strong&gt;pattern&lt;/strong&gt; they describe to take hold. We have, in 2026, the technology to fundamentally change how working with computers feels — physically, mentally, ergonomically, financially. Most people don’t know it yet because the pattern doesn’t have a name they recognize. These are the names I think will help.&lt;/p&gt;

&lt;p&gt;If any of this resonates: clone the &lt;a href="https://github.com/nicedreamzapp/claude-code-local" rel="noopener noreferrer"&gt;open-source stack&lt;/a&gt;, assemble your own ambient setup, and tell me what you find. The next decade of computing isn’t being decided in any one company’s roadmap. It’s being decided by who shows up and starts using the parts that already exist.&lt;/p&gt;

&lt;p&gt;— Matt&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of the &lt;a href="https://nicedreamzwholesale.com/software/" rel="noopener noreferrer"&gt;Nice Dreamz&lt;/a&gt; lineup. If you’re a firm that wants the ambient stack installed and air-gapped for compliance reasons, &lt;a href="https://nicedreamzwholesale.com/airgap/" rel="noopener noreferrer"&gt;AirGap AI&lt;/a&gt; is the engagement I do for that.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://marijuanaunion.com" rel="noopener noreferrer"&gt;Marijuana Union&lt;/a&gt;. For premium vaporizers visit &lt;a href="https://ineedhemp.com" rel="noopener noreferrer"&gt;iNeedHemp&lt;/a&gt;, wholesale at &lt;a href="https://nicedreamzwholesale.com" rel="noopener noreferrer"&gt;Nice Dreamz&lt;/a&gt;, and seeds at &lt;a href="https://tribeseedbank.com" rel="noopener noreferrer"&gt;Tribe Seed Bank&lt;/a&gt;. Explore the 3D cannabis marketplace at &lt;a href="https://marijuanaunion.com/marketplace/" rel="noopener noreferrer"&gt;The Farmstand&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>software</category>
      <category>ambientcomputingvoca</category>
    </item>
    <item>
      <title>Your Medical Practice Is Probably Using Cloud AI on PHI Right Now — Heres the HIPAA Problem Nobody Is Talking About</title>
      <dc:creator>Matt Macosko</dc:creator>
      <pubDate>Wed, 22 Apr 2026 09:27:41 +0000</pubDate>
      <link>https://dev.to/matt_macosko_f3829cfd86b8/your-medical-practice-is-probably-using-cloud-ai-on-phi-right-now-heres-the-hipaa-problem-nobody-42nf</link>
      <guid>https://dev.to/matt_macosko_f3829cfd86b8/your-medical-practice-is-probably-using-cloud-ai-on-phi-right-now-heres-the-hipaa-problem-nobody-42nf</guid>
      <description>&lt;p&gt;Walk into any small medical practice today and ask the front-desk staff if they’ve ever pasted a chart note into ChatGPT to “rewrite this so the patient understands it” or to “summarize this lab result.” A lot of them will say yes. Some will say no but their browser history says otherwise. A few will look genuinely surprised that anyone’s asking.&lt;/p&gt;

&lt;p&gt;Here’s what they’re not thinking about: protected health information (PHI) under HIPAA includes more than the obvious identifiers. It includes anything that, in combination, could identify a patient — symptoms plus visit date, lab values plus condition, even free-text descriptions if specific enough. Once that text leaves the practice’s network and lands in a cloud AI service, the practice has technically engaged that AI vendor as a business associate, and a Business Associate Agreement (BAA) is required. The big AI providers offer BAAs only on enterprise tiers — usually $$$$ a month. Most practices using ChatGPT or Claude on patient data have no BAA in place at all.&lt;/p&gt;

&lt;p&gt;That’s a HIPAA breach waiting to be discovered. And it’s already everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is suddenly a real problem
&lt;/h2&gt;

&lt;p&gt;For years, the assumption was that nobody on staff would use a “ChatGPT” on actual patient data — it’d be obvious that PHI shouldn’t go to a third-party server. That assumption no longer holds. The tools are too useful. Practice managers, MAs, billers, NPs, even physicians are routinely pasting things in: prior-auth letters, patient instructions, summary letters to referring providers, insurance appeal templates, lab interpretations.&lt;/p&gt;

&lt;p&gt;Each of those sessions, on the major cloud AI providers, is potentially a HIPAA-reportable event. The practice doesn’t know it. The vendor doesn’t know who the patient is. But under the regulation, the disclosure happened the moment the text crossed the network boundary without a BAA in place.&lt;/p&gt;

&lt;p&gt;OCR enforcement actions in the past few years have been heavy on exactly this kind of “we didn’t realize we were using a third-party processor” finding. Penalties for unintentional disclosure under the &lt;strong&gt;HIPAA Privacy Rule&lt;/strong&gt; start at $137 per violation and can reach $68,928 per violation depending on culpability — and “violations” can be counted per record disclosed. A single staff member pasting 30 patient summaries into ChatGPT over a quarter is, by the regulation’s math, 30 violations.&lt;/p&gt;

&lt;p&gt;Most practices are not okay if they’re audited tomorrow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix that actually works: on-device AI
&lt;/h2&gt;

&lt;p&gt;The practice doesn’t have to give up AI. It just has to keep it on the practice’s own machines, where there’s no third-party processor relationship.&lt;/p&gt;

&lt;p&gt;Modern Apple Silicon Macs — the kind a lot of practices already have at the front desk or in clinician offices — can run open-weight language models locally. The model runs in the Mac’s unified memory using Apple’s MLX framework. Prompts and responses never touch a network connection. There’s no API key, no vendor account, no outbound traffic to log or audit.&lt;/p&gt;

&lt;p&gt;For HIPAA purposes, this changes the legal posture entirely. Software running on a covered entity’s own hardware, with no data transmission outside the entity’s secured environment, is &lt;strong&gt;not&lt;/strong&gt; a third-party disclosure. It’s the same legal category as a Word document — local software processing data the practice already has lawful access to.&lt;/p&gt;

&lt;p&gt;The HIPAA Security Rule still applies (the Mac itself needs to be physically secured, encrypted at rest, with access controls), but those are the same controls the practice already runs for its EMR workstation. No new vendor risk. No new BAA. No quarterly compliance review of the AI provider’s SOC 2 report.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it looks like inside a practice
&lt;/h2&gt;

&lt;p&gt;A typical small-practice install:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2-5 MacBooks (front desk, clinician workstations, billing). Most practices already have these.&lt;/li&gt;
&lt;li&gt;An open-source MLX server installed on each, running a 31B or 70B language model.&lt;/li&gt;
&lt;li&gt;A simple chat interface on the desktop. Looks and feels like ChatGPT. Behaves the same. Just doesn’t phone anywhere.&lt;/li&gt;
&lt;li&gt;A one-page &lt;strong&gt;HIPAA AI Use Policy&lt;/strong&gt; documenting that the practice’s AI tools run on-premises with no third-party data processors. This goes in the practice’s compliance binder.&lt;/li&gt;
&lt;li&gt;An hour of staff training on what tasks make sense for the local AI vs. what should still go through the EMR.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After install, the practice’s AI usage is HIPAA-clean. Nothing to add to the BAA log. Nothing to disclose to patients. Nothing to argue about in an audit.&lt;/p&gt;

&lt;h2&gt;
  
  
  The specific wins for a medical practice
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Patient instructions in plain language.&lt;/strong&gt; Convert “post-op care: keep wound site dry x 5 days, rotate dressing q12h, NSAIDs prn” into a paragraph the patient will actually read. Local model, no PHI exposure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prior auth letters.&lt;/strong&gt; Drafting these from chart notes is a huge time sink. Local AI can generate the first draft from the relevant note, with the chart never leaving the practice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Insurance appeals.&lt;/strong&gt; Same pattern. The AI sees the denial letter and the relevant clinical history; the practice’s data stays local.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Letters to referring providers.&lt;/strong&gt; Clean, professional, fast — without sending the patient’s chart to a cloud LLM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Patient education content&lt;/strong&gt; customized to the practice (not the same generic handouts every other clinic uses).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these are dramatic. All of them are time savers worth tens of hours per month per provider. And every one of them is safe to do with on-device AI in a way that’s genuinely not safe to do with cloud AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the cost actually looks like
&lt;/h2&gt;

&lt;p&gt;A small practice using cloud AI properly (with a BAA-covered enterprise tier) is looking at $40-100 per user per month, plus the legal and compliance overhead of vetting the vendor and adding them to the BAA log. That’s $5,000-15,000+ per year in subscription cost for a 5-person practice, before any compliance staff time.&lt;/p&gt;

&lt;p&gt;A one-time on-device install for the same practice runs &lt;strong&gt;$8,000 to $15,000&lt;/strong&gt; all-in (hardware aside — most practices already have the Macs). After that: zero recurring AI subscription cost. The AI runs on hardware the practice already owns, indefinitely.&lt;/p&gt;

&lt;p&gt;The financial argument is real, but it’s secondary to the compliance argument. The compliance argument is: &lt;strong&gt;on-device AI is the only AI configuration that doesn’t create a HIPAA business-associate relationship.&lt;/strong&gt; That’s not a marginal advantage. That’s a categorical difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who should be looking at this now
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Solo and small group practices&lt;/strong&gt; doing primary care, behavioral health, dermatology, OB/GYN, mental health, dentistry — anywhere clinicians are tempted to use AI on chart text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Therapy and counseling practices&lt;/strong&gt; where session notes are particularly sensitive and where most cloud AI tools are an obvious non-starter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concierge / direct-pay practices&lt;/strong&gt; where patients explicitly chose the practice for higher privacy expectations than chain medicine offers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Practices that already had a HIPAA scare&lt;/strong&gt; — a near-miss, an OCR letter, a malware incident — where the leadership now takes data flow questions seriously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Any practice in California, New York, Massachusetts, or other states&lt;/strong&gt; with privacy laws that exceed HIPAA in scope.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the practice’s leadership doesn’t know exactly what AI tools the staff are currently using on patient text, that’s the answer to “should we look at this.” The fix is not to ban AI (it’ll go underground), it’s to give the practice an AI that doesn’t create a vendor-risk problem.&lt;/p&gt;




&lt;p&gt;I do on-device AI installations for small medical and therapy practices — fixed-fee, one week start to finish, including the HIPAA AI Use Policy and staff training. If your practice is quietly accumulating AI usage without a clear compliance posture, this is the cleanest fix on the market.&lt;/p&gt;

&lt;p&gt;More detail: &lt;a href="https://nicedreamzwholesale.com/airgap/" rel="noopener noreferrer"&gt;AirGap AI&lt;/a&gt; — book a 15-minute call from that page and I’ll walk through whether on-device is the right fit for your specific setup.&lt;/p&gt;

&lt;p&gt;— Matt Macosko, &lt;a href="https://nicedreamzwholesale.com" rel="noopener noreferrer"&gt;Nice Dreamz LLC&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The open-source software the install is built on is public at &lt;a href="https://github.com/nicedreamzapp/claude-code-local" rel="noopener noreferrer"&gt;github.com/nicedreamzapp/claude-code-local&lt;/a&gt; — you or your IT contractor can review exactly what runs on practice hardware.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://marijuanaunion.com" rel="noopener noreferrer"&gt;Marijuana Union&lt;/a&gt;. For premium vaporizers visit &lt;a href="https://ineedhemp.com" rel="noopener noreferrer"&gt;iNeedHemp&lt;/a&gt;, wholesale at &lt;a href="https://nicedreamzwholesale.com" rel="noopener noreferrer"&gt;Nice Dreamz&lt;/a&gt;, and seeds at &lt;a href="https://tribeseedbank.com" rel="noopener noreferrer"&gt;Tribe Seed Bank&lt;/a&gt;. Explore the 3D cannabis marketplace at &lt;a href="https://marijuanaunion.com/marketplace/" rel="noopener noreferrer"&gt;The Farmstand&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>software</category>
      <category>medicalpracticeshipa</category>
    </item>
  </channel>
</rss>
