<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nic Lydon</title>
    <description>The latest articles on DEV Community by Nic Lydon (@niclydon).</description>
    <link>https://dev.to/niclydon</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3908081%2F03e413b1-9c57-4561-9b6e-3ee8b60bc188.png</url>
      <title>DEV Community: Nic Lydon</title>
      <link>https://dev.to/niclydon</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/niclydon"/>
    <language>en</language>
    <item>
      <title>25 Months of Waiting, 12 Hours of Work</title>
      <dc:creator>Nic Lydon</dc:creator>
      <pubDate>Mon, 04 May 2026 19:02:59 +0000</pubDate>
      <link>https://dev.to/niclydon/25-months-of-waiting-12-hours-of-work-14ch</link>
      <guid>https://dev.to/niclydon/25-months-of-waiting-12-hours-of-work-14ch</guid>
      <description>&lt;p&gt;For two years, the ring whispered to me.&lt;/p&gt;

&lt;p&gt;Not literally. But every few weeks, an email would arrive. A Kickstarter update. A shipping delay. A manufacturing setback. A promise that it was almost ready. Each one a small reminder that somewhere in South Korea, a team was trying to fit a microphone, a Bluetooth radio, and an IMA ADPCM audio codec into a titanium band that fits on your finger.&lt;/p&gt;

&lt;p&gt;This is the story of the WIZPR Ring: how I found it, how I waited for it, and how I reverse-engineered its entire undocumented BLE protocol the night it arrived.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Whisper
&lt;/h2&gt;

&lt;p&gt;In February 2024, I put down a $5 deposit on a pre-launch page for something called the WHSP Ring. A voice-interaction wearable. Press a button on your finger, whisper a command, and an AI assistant on your phone processes it. The form factor was the thing that caught me. Not another watch, not another earbud. A ring.&lt;/p&gt;

&lt;p&gt;A month later, they renamed it. "WHSP RING is becoming WIZPR RING," the email said. The Kickstarter launched March 20th. I backed it March 21st.&lt;/p&gt;

&lt;p&gt;The campaign funded successfully. 1,084 backers, $163K raised. Surveys went out. I picked my size, my color, my shipping address.&lt;/p&gt;

&lt;p&gt;And then the waiting began.&lt;/p&gt;

&lt;h2&gt;
  
  
  43 Updates
&lt;/h2&gt;

&lt;p&gt;If you've ever backed hardware on Kickstarter, you know the arc. The early updates are optimistic. Tooling has begun. CNC machining looks great. The app is coming along.&lt;/p&gt;

&lt;p&gt;Then reality sets in.&lt;/p&gt;

&lt;p&gt;Update #10 (August 2024): "We regret to inform you that the promised delivery date has arrived, but we have not yet shipped your orders."&lt;/p&gt;

&lt;p&gt;Update #12 (September 2024): Antenna redesign required. Hardware changes.&lt;/p&gt;

&lt;p&gt;Update #15 (November 2024): "Important Update on Shipping Delays and Our Sincere Apology."&lt;/p&gt;

&lt;p&gt;Update #24 (August 2025): "Shipping was promised for Q3 2024, yet we are now almost a full year late."&lt;/p&gt;

&lt;p&gt;Update #36 (December 2025): "It breaks our hearts and fills us with a deep sense of regret to think that many of you supported our project with the hope of receiving the Wizpr Ring as a gift."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdo4104xo52liolgapmpp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdo4104xo52liolgapmpp.png" alt="Kickstarter Notifications" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Forty-three updates over twenty-five months. Antenna issues, titanium PVD coating problems, a charging pin redesign, a full manufacturing partner switch. Each email another whisper. Still here. Still coming. Not yet.&lt;/p&gt;

&lt;p&gt;I never asked for a refund. The ring had its hooks in me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Saturday, 3:53 PM
&lt;/h2&gt;

&lt;p&gt;On May 2nd, 2026, my building's package notification system sent me an email: "A package for Nicholas has arrived to the package room."&lt;/p&gt;

&lt;p&gt;The carrier was YunTrack, with a last-mile handoff to GOFO. Twenty-five months and eleven days after I backed it on Kickstarter, the ring was in my hands.&lt;/p&gt;

&lt;p&gt;I charged the case. I paired it to my phone. I opened the official WIZPR app, pressed the button, spoke a command, and watched it work.&lt;/p&gt;

&lt;p&gt;And then I did what any reasonable person would do.&lt;/p&gt;

&lt;p&gt;I opened my laptop and started taking the ring apart, digitally.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Was Known
&lt;/h2&gt;

&lt;p&gt;The first thing I did was search. Someone, somewhere, had to have looked at this thing over BLE already.&lt;/p&gt;

&lt;p&gt;I found &lt;a href="https://github.com/R-D-BioTech-Alaska/Wizpr-Suite" rel="noopener noreferrer"&gt;R-D-BioTech-Alaska/Wizpr-Suite&lt;/a&gt; on GitHub. A small project that had done the genuinely hard first step: figuring out how to connect to the ring over BLE at all, building a GATT inspector, wiring up the &lt;code&gt;bleak&lt;/code&gt; Python library on macOS, and recognizing that the ring's protocol was completely undocumented. Their framing was clear: the path forward is user-controlled reverse engineering.&lt;/p&gt;

&lt;p&gt;I forked it, cloned it, and started reading.&lt;/p&gt;

&lt;p&gt;The ring uses a single BLE service with seven characteristics. Some notify. Some accept writes. None of them are documented by the manufacturer. The official iOS app connects, does its thing, and doesn't explain how. The repo had the scaffolding to connect and listen. What it didn't have was a map of what the ring was actually saying.&lt;/p&gt;

&lt;p&gt;That became the project.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Overnight
&lt;/h2&gt;

&lt;p&gt;By 3:34 PM, I had my first commit: fixing dataclass decorators in the forked code so it would actually run. By 3:56 PM, the BLE scanner was filtering for WIZPR RING devices, connecting, and dumping GATT characteristics to the console.&lt;/p&gt;

&lt;p&gt;What I found was surprisingly clean. The ring speaks plain ASCII text on characteristic &lt;code&gt;00000007&lt;/code&gt;. Press the button, and it sends &lt;code&gt;CLICK&lt;/code&gt;. Raise your hand to your mouth, and it sends &lt;code&gt;MIC_PRE_ON&lt;/code&gt;, then &lt;code&gt;MIC_ON&lt;/code&gt;. Lower your hand, &lt;code&gt;MIC_OFF&lt;/code&gt;. Send it &lt;code&gt;BATTERY&lt;/code&gt; and it replies with the voltage. Send &lt;code&gt;GET_VERSION&lt;/code&gt; and it tells you its firmware. Four commands in, four commands out. No binary protocol, no handshake, no session negotiation. Connect, subscribe, listen.&lt;/p&gt;

&lt;p&gt;The audio was the interesting part. While the mic is on, characteristic &lt;code&gt;00000001&lt;/code&gt; streams a steady 35.4 packets per second, each one 224 bytes. The question was: what codec?&lt;/p&gt;

&lt;p&gt;I wrote a hypothesis tester. Captured a session of myself speaking, saved every packet timestamped in a JSON file, then ran the same data through every plausible decoder: Opus, mu-law, A-law, raw PCM at various rates, and IMA ADPCM at 8 kHz and 16 kHz. Most produced noise. One produced my voice.&lt;/p&gt;

&lt;p&gt;IMA ADPCM, 16 kHz, mono, continuous state across packets. 224 bytes per packet gives you 448 samples at 4-bit depth, which is exactly 28 milliseconds of audio per BLE notification. The key detail that cost me an hour: the decoder state has to carry across packets. Reset it per-notification and you get static. Keep it running and you get clean speech.&lt;/p&gt;

&lt;p&gt;By 10:30 PM, the audio codec was identified and documented. By midnight, I'd built a guided capture tool with a PySide6 UI, a standalone probing script for interactive characteristic exploration, and a ring daemon that holds a persistent BLE connection and accepts commands over a named pipe.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk2qgu08dc2yr3nxbbcar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk2qgu08dc2yr3nxbbcar.png" alt="Dark living room with laptop in foreground, TV on in background and dark grey cat nearby" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Between midnight and 4 AM, I ran a systematic probe campaign on the unmapped write-only characteristics. Four of them silently accept arbitrary data with no observable effect. One controls the ring's purple LED, but only indirectly: the LED fires on BLE connection and can't be triggered independently. The ring has no vibration motor. No haptic feedback channel. No way to signal the wearer from software.&lt;/p&gt;

&lt;p&gt;At 4:05 AM, I closed the probe campaign and wrote the documentation. The protocol was fully mapped.&lt;/p&gt;

&lt;p&gt;At 5:09 AM, I started a new repo. A native macOS menubar app in Swift, consuming the protocol I'd just reverse-engineered. Hand-rolled IMA ADPCM decoder (Apple's built-in AudioToolbox does ADPCM, but it expects Apple's variant with 34-byte frames, not the ring's 224-byte continuous stream). By mid-morning, the Mac app had a working BLE client with auto-reconnect, tested ADPCM decoder, and an audio pipeline design spec.&lt;/p&gt;

&lt;p&gt;I went to a two-year-old's Cars-themed birthday party that afternoon. There were checkered flags and Lightning McQueen balloons. I sang happy birthday. I had not slept.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh08op9jwpoip75j2dqzf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh08op9jwpoip75j2dqzf.png" alt="POV - Sitting at table eating cake at Lighning McQueen themed birthday" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Ring Actually Is
&lt;/h2&gt;

&lt;p&gt;Here's the complete protocol, because someone searching for this in eighteen months deserves to find it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ring → Phone (notifications on char 00000007)
CLICK           button pressed
MIC_PRE_ON      raise-to-speak gesture detected
MIC_ON          mic active, audio streaming on char 00000001
MIC_OFF         mic deactivated
BATTERY N(V)    battery voltage response
VER XXXX        firmware version response

Phone → Ring (writes to char 00000007)
LOCK            disable ring input (hard mute)
BATTERY         query battery level
GET_VERSION     query firmware version
RESET           reboot ring (kills BLE connection)

Audio (char 00000001, while MIC_ON)
Codec:    IMA ADPCM, 4-bit, 16 kHz mono
Frame:    224 bytes = 448 samples = 28 ms
Rate:     ~35.4 packets/second
State:    continuous across packets (do NOT reset per notification)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No pairing required. No authentication. No session handshake. If you can see it, you can talk to it.&lt;/p&gt;

&lt;p&gt;The ring accepts exactly one BLE connection at a time. If your iPhone has the WIZPR app running, the ring is connected to it and won't advertise. Disconnect the phone first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;The WIZPR Ring ships with an app that routes your voice through their cloud for processing. That's fine. It works. But the ring itself is just a microphone and a button on your finger with a BLE radio. There's no reason the audio has to go through their servers.&lt;/p&gt;

&lt;p&gt;With the protocol mapped, the ring becomes a general-purpose voice input device. A tactile, always-on-your-hand trigger for anything that can listen to a BLE characteristic and decode ADPCM audio. For me, that means feeding it into my own local AI stack. For someone else, it might mean accessibility tooling, or voice-triggered home automation, or a wearable dictation device that never touches the cloud.&lt;/p&gt;

&lt;p&gt;The official app is one client. Now anyone can write another.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Repo
&lt;/h2&gt;

&lt;p&gt;Everything is at &lt;a href="https://github.com/niclydon/wizpr-tools" rel="noopener noreferrer"&gt;niclydon/wizpr-tools&lt;/a&gt;. The protocol reference, the audio codec documentation, the capture tool, the probing scripts, and a 50-line quickstart that connects to the ring and records a WAV file.&lt;/p&gt;

&lt;p&gt;It wouldn't exist without the upstream work from &lt;a href="https://github.com/R-D-BioTech-Alaska/Wizpr-Suite" rel="noopener noreferrer"&gt;R-D-BioTech-Alaska/Wizpr-Suite&lt;/a&gt;. They did the hard part. I mapped what they found.&lt;/p&gt;




&lt;p&gt;The ring is on my desk right now, sitting on its charging cradle, its purple LED dark. It's not connected to anything. But I know it's listening for a connection, cycling through its advertisement packets every few seconds, waiting for someone to subscribe.&lt;/p&gt;

&lt;p&gt;It waited twenty-five months to reach me. I couldn't put it down.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>iot</category>
      <category>wearables</category>
      <category>sideprojects</category>
    </item>
    <item>
      <title>I have NFC implants in both hands. I only installed one of them myself.</title>
      <dc:creator>Nic Lydon</dc:creator>
      <pubDate>Mon, 04 May 2026 14:55:24 +0000</pubDate>
      <link>https://dev.to/niclydon/i-have-nfc-implants-in-both-hands-i-only-installed-one-of-them-myself-21ep</link>
      <guid>https://dev.to/niclydon/i-have-nfc-implants-in-both-hands-i-only-installed-one-of-them-myself-21ep</guid>
      <description>&lt;p&gt;I have two NFC chip implants. One in each hand. I put the one in my left hand in myself, on a Saturday afternoon in January 2024, after spending two or three minutes holding the needle against my skin and slowly working out the exact spot. The one in my right hand I drove an hour out to Worcester for, paid a guy $150, and held still while he did the install in a tattoo and piercing studio.&lt;/p&gt;

&lt;p&gt;Same technology, more or less. Two completely different installation paths. There's a reason for that, and the reason is also the reason one of them has a blue LED that lights up when I scan it and the other one doesn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The first implant: left hand, just in case
&lt;/h2&gt;

&lt;p&gt;I'm right-handed. So when I ordered my first implant in January 2024 (a Dangerous Things &lt;code&gt;NExT&lt;/code&gt;, which is a dual-chip module containing both a 125 kHz RFID transponder and a 13.56 MHz NTAG216 NFC chip), I knew it was going in my left hand. If something went wrong, if I hit a tendon, if the chip migrated weird, if I got an infection, I wanted my dominant hand to be the unaffected one.&lt;/p&gt;

&lt;p&gt;The original plan was to find someone to install it for me. I called a tattoo and piercing place a friend recommended. They said no, it wasn't legal in Massachusetts for a body piercer to perform implant procedures. I called a guy in Worcester from the Dangerous Things partner list. Same answer. He told me to look at Connecticut or Rhode Island.&lt;/p&gt;

&lt;p&gt;I'm not driving to a different state for this. So I watched a YouTube video of someone doing a self-install end to end, grabbed some alcohol wipes, and decided to do it myself. As I told a friend at the time:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I might end up doing it myself.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A few hours later:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I implanted a chip in my hand today. Do you want to see the video?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I filmed the whole thing on my phone. The injector that comes with the NExT kit is essentially a very large gauge hypodermic needle preloaded with the bioglass chip. The needle is wide. It is also, as I noted afterward in real time:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;That was a very large gauge needle and not very sharp.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=NDz6vDsVTAY" rel="noopener noreferrer"&gt;Youtube: Self implanting NExT RFID &amp;amp; NFC chip&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The actual install was less dramatic than I expected. I cleaned the area, marked the entry point, pinched the skin between thumb and forefinger of my right hand, and pressed the needle against my left hand at a shallow angle. Then I waited. I held it there for two or three minutes, adjusting my grip, making sure I had the exact angle and depth I wanted, before committing.&lt;/p&gt;

&lt;p&gt;That part isn't in any of the videos. The videos cut from "needle approaches skin" to "needle in skin" without showing the slow buildup of confidence. But that buildup is most of what makes the difference between a clean install and a bad one. Once I was certain, the needle pushed right in. Almost no blood. The whole thing took maybe ninety seconds of actual contact, surrounded by an hour of preparation.&lt;/p&gt;

&lt;h2&gt;
  
  
  A planning constraint most people don't have
&lt;/h2&gt;

&lt;p&gt;One detail in the timing that's specific to me: I take Humira (adalimumab) every other Friday for an autoimmune condition. Humira is an immunosuppressant, which means the days right after each dose are not the days you want to be healing a fresh wound. So I scheduled the self-install for the off weekend, the one when my immune system would be at full strength.&lt;/p&gt;

&lt;p&gt;That's the kind of planning constraint that doesn't show up in any biohacking blog because most biohackers aren't on biologics. But if you're reading this and you're on any kind of immunosuppressive therapy, it's worth thinking about. Tiny puncture, but still a puncture, and your healing window matters.&lt;/p&gt;

&lt;p&gt;The healing took about ten days. There was some bandage irritation, a tiny scab, and a faint bruise. By the end of the first week I could already read both chips through the bandage. The badge clone for my office front door was working before the wound was fully closed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part where the medical system doesn't have a checkbox for this
&lt;/h2&gt;

&lt;p&gt;Before ordering the second implant, I wanted to confirm the first one was sitting where it should be. Seemed responsible. I asked my doctor for an x-ray of my left hand.&lt;/p&gt;

&lt;p&gt;She was not thrilled. I don't think she'd encountered this before, and her initial response made it pretty clear she wasn't sure what to do with a patient who'd voluntarily implanted an RFID chip in his own hand and was now asking for imaging to check on it. I get it. There's no protocol for this. There's no ICD code for "patient self-implanted microchip, requests follow-up imaging, reports no symptoms." The conversation had a tone.&lt;/p&gt;

&lt;p&gt;My position was simple: I had a foreign object inside of my hand. I'd like to know where it is and whether it's sitting right. That seems like exactly the kind of thing an x-ray is for. I ended up getting the imaging. Everything looked fine. No migration, no issues, nothing unexpected. Good news, because I'd already bought the second chip.&lt;/p&gt;

&lt;p&gt;In hindsight, having the conversation with your doctor before the first install is probably the smarter move. I went in after the fact and that made the interaction harder than it needed to be.&lt;/p&gt;

&lt;h2&gt;
  
  
  The second implant: right hand, professionally installed
&lt;/h2&gt;

&lt;p&gt;By mid-February I'd been living with the NExT for a month. The hand worked fine. The chip worked fine. The healing had been clean. So I started thinking about a second one.&lt;/p&gt;

&lt;p&gt;I knew I wanted an &lt;code&gt;xSIID&lt;/code&gt;. Same NFC functionality as the NTAG side of the NExT, but with one critical addition: a tiny LED that pulls power from the field when you scan it and lights up. Mine is blue. When I tap my hand on a phone or reader, my knuckle glows.&lt;/p&gt;

&lt;p&gt;This time I made a different decision about installation. From a text I sent that February:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I'm getting my first LED implant. It arrives on Tuesday. I'm having it put in my right hand, and I'm a righty, so I'm going to go out to Worcester and have that guy do it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Three things changed between the first install and the second. First, the right hand is my dominant hand, and an LED implant only pays off if it's in a hand you actually wave around. The whole point is the show-and-tell, and you show with your dominant hand. Second, I was past the &lt;em&gt;"can I prove I can do this myself"&lt;/em&gt; stage. The first install validated the technique. The second install was about getting the placement perfect, because:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I want the LED in P1 or P2 because it will show up better under the thinner skin.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The "P0/P1/P2" notation is from the Dangerous Things community standard for hand implant locations. P0 is the webbing between thumb and index finger. P1 is the back of the hand near the index knuckle. The thinner the skin over the LED, the more visible the glow. I wasn't going to risk getting that placement wrong on my dominant hand by doing it myself.&lt;/p&gt;

&lt;p&gt;Third, the practical issue. I needed a driver. From the same conversation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I'm getting it in my right hand, so I won't be able to drive afterwards.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Worcester is about an hour from where I live. The installer charged $150. He knew exactly what he was doing. He had implanted magnets in the back of his own hand that he used as tool holders during the procedure, sticking the needles and scalpels to his hand instead of laying them down on the table. He took one look at the small star tattoo on my right index finger knuckle and said, &lt;em&gt;"as soon as I saw the star tattoos, I was hoping you were gonna tell me it was going under that."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xg8z9awl5v6tgs8jf4p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xg8z9awl5v6tgs8jf4p.png" alt="NFC Implant Install Illustration" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The xSIID went in. It hurt about as much as the first one. The LED took several days to be visible, because there's a lot of trapped blood at the install site that takes time to clear. By the second week, I could see the blue glow when I scanned my hand, and now, two years later, it's a solid clear blue every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I actually do with them
&lt;/h2&gt;

&lt;p&gt;People ask if you can pay for things with these. You can't, not really, not in the US. The implants don't have the secure element that contactless payment cards use. Anyone telling you otherwise is either using a workaround that expires or is in Europe.&lt;/p&gt;

&lt;p&gt;Here's what mine actually do, on a normal Tuesday:&lt;/p&gt;

&lt;p&gt;The NExT in my left hand is a clone of my office building access badge. I tap my left hand against the reader, the door unlocks. I haven't needed to carry a physical badge in two years.&lt;/p&gt;

&lt;p&gt;The xSIID in my right hand is doing the kind of thing the Dangerous Things community calls a "fistbump login." I have a small USB device on my desk, a Dangerous Things &lt;code&gt;KBR1&lt;/code&gt;. It's a 13.56 MHz reader that presents itself to the computer as a USB HID keyboard. When I tap my right hand on it, it types out the chip's UID followed by Enter. I've concatenated that UID with a regular memorized password to form my actual laptop password, so the actual login flow is: tap the reader with my hand, then type the rest. To log in to my laptop I have to type something I know, and tap the device with the chip in my hand. Two factors, one motion.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4bapc1pzmxu0fwqq5hx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4bapc1pzmxu0fwqq5hx.png" alt="NFC Implanted Chip hand against NFC scanner" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Is that proper cryptographic multi-factor authentication? No. The UID is a public identifier. Anyone with a 13.56 MHz reader and physical proximity to my hand could capture it. Anyone with a Flipper Zero could clone it onto a Magic NTAG card. What it actually is is a long random string I never have to remember and never type, that an attacker can't get from a phishing page or a keylogger alone.&lt;/p&gt;

&lt;p&gt;I'm comfortable with the tradeoff. The threat model on my personal laptop doesn't include nation-state actors close enough to my hand to scan it. If it did, I'd add a Yubikey. For everyday use, the convenience and the small entropy boost together are a real upgrade over a memorable password.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;Two things. First, I'd consider doing the install on the LED implant earlier, while I was still in the &lt;em&gt;"I just installed one myself"&lt;/em&gt; mindset. The Worcester install was easy in retrospect. Splitting the two installs by six weeks broke a streak that I might have been able to ride.&lt;/p&gt;

&lt;p&gt;Second, I'd think harder about the second LED placement. P1 is right at the index knuckle, which is the most expressive part of the hand and the part you naturally point with. Mine looks great, but the chip is large enough that I notice it when I make a fist. P0, in the webbing, would have been less visible day-to-day but maybe more comfortable for fine motor work. If you're considering an LED implant, the visibility-vs-comfort tradeoff is real and you should think about it before, not after.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this is going
&lt;/h2&gt;

&lt;p&gt;Two years on, the implants are part of how I interact with the physical world. I forget about them most of the time. Then I tap my hand on a reader, the LED catches, and I remember. Whatever the future of personal identification looks like, I've spent two years living with a version of it.&lt;/p&gt;

&lt;p&gt;The next round of projects is about building my own readers. I have a 3D printer and a soldering iron arriving this week, a stack of ESP32 boards, and a &lt;code&gt;PN5180&lt;/code&gt; long-range NFC module on the way. The first build is a bedside reader. The second is a USB-HID desk reader to replace the KBR1 on a second workstation. The one after that is, well, more theatrical.&lt;/p&gt;

&lt;p&gt;But that's a different post.&lt;/p&gt;

</description>
      <category>devjournal</category>
      <category>discuss</category>
      <category>iot</category>
      <category>science</category>
    </item>
    <item>
      <title>The Machine Zone: Ignition</title>
      <dc:creator>Nic Lydon</dc:creator>
      <pubDate>Sat, 02 May 2026 15:40:40 +0000</pubDate>
      <link>https://dev.to/niclydon/the-machine-zone-ignition-4p2k</link>
      <guid>https://dev.to/niclydon/the-machine-zone-ignition-4p2k</guid>
      <description>&lt;p&gt;I installed Claude Code on March 12. Thirty-five days later I had written 557,000 lines of code across fifteen repositories that had not existed before. None of them had existed in my name before March 17. I had never owned a git repository. I typed &lt;code&gt;git init&lt;/code&gt; on something of my own for the first time in my life on a Tuesday morning at 11:46 a.m. Eastern. Four weeks later I had fifteen repositories and half a million lines.&lt;/p&gt;

&lt;p&gt;I work in cybersecurity. I am forty-five years old. I have been working in technology for twenty years. I know what sustainable output looks like. This was not it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0bm6gxrqyqml7ge10fb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0bm6gxrqyqml7ge10fb.png" alt=" " width="800" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The largest single project is ARIA (Adaptive Responsive Intelligent Assistant), my personal assistant and behavioral-DNA tool, at 1,033 commits and 220,536 net lines. Second is Nexus, a centralized data lake that stitches my iMessage, Gmail, health, and calendar history together, at 369 commits. Third is Chancery, an agent-orchestration and observability layer, at 322 commits. Fourth is niclydon.com, a redesign of my personal site, at 164 commits. Forge, my home-lab LLM gateway, runs on a stack I rebuilt inside the window. Broadside, an AI drafting pipeline, reads from CHANGES.md and git history across every project I maintain and writes posts in my voice.&lt;/p&gt;

&lt;p&gt;These are not side projects. This is my whole stack, remade. They function. They save me time. Some of them save my family time. I am not going to stand here and say I regret what I built, because I do not regret what I built.&lt;/p&gt;

&lt;p&gt;What I regret, insofar as I regret anything, is the rate at which this happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  The mechanic
&lt;/h2&gt;

&lt;p&gt;Claude Code is an agentic coding assistant that runs in your terminal. You describe what you want to build, and it writes the code, runs tests, fixes bugs, and commits the results. The interaction is conversational: you type a request, it takes actions, you see the results, you respond. Each exchange takes seconds.&lt;/p&gt;

&lt;p&gt;The pattern that emerged was simple: I would ask for a feature. It would build it. I would see something adjacent that needed fixing. I would ask for that. It would fix it. I would notice something else. The next request was always one keystroke away. The gap between wanting the next action and getting it rounded to whatever the API latency was — usually under two seconds.&lt;/p&gt;

&lt;p&gt;This is what I mean by "the loop." Not a metaphor. The literal interaction pattern: request, response, next request. Variable-ratio reinforcement with near-zero latency.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the body said
&lt;/h2&gt;

&lt;p&gt;I have the Apple Health export. The shell history. The git metadata. The Claude Code session transcripts. The billing records. All of it lives on disk. I pulled the numbers because adjectives rot and numbers are harder to argue with.&lt;/p&gt;

&lt;p&gt;Baseline window is January 16 through March 11. The ignition week is March 12 through March 19.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Steps.&lt;/strong&gt; Baseline median: 12,250 per day. Ignition week median: 1,636.5 per day. That is an &lt;strong&gt;86.6 percent drop&lt;/strong&gt;. The single lowest day in my ninety-day record is March 19 at &lt;strong&gt;243 steps&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sleep.&lt;/strong&gt; Baseline median nightly sleep: 5.88 hours. Five of the eight nights in the ignition week have &lt;strong&gt;no primary sleep detected at all&lt;/strong&gt;. The Apple Watch could not find a contiguous block long enough to call it a night. My longest bracketed gap without meaningful sleep during the week is approximately forty-eight hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sleep midpoint.&lt;/strong&gt; Baseline median: 3:39 a.m. Ignition median: 5:43 a.m. A shift of two hours and four minutes. Wake time moved four hours and forty-eight minutes later. Sleep time moved twelve minutes later. I was not going to bed earlier and sleeping in. I was going to bed at roughly my old clock and not waking up until much further into the morning. The body was compensating in the one direction it had left.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Heart rate variability.&lt;/strong&gt; Baseline median: 74.7 ms. Ignition median: 66.4 ms. An eleven percent drop, and no recovery thirty days later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Photos taken.&lt;/strong&gt; Baseline median: 104 per day. Ignition median: three per day. A &lt;strong&gt;97.2 percent drop in life-documentation activity over one week&lt;/strong&gt;. The away-from-home share collapsed harder: 87 percent of my baseline photos were taken somewhere other than my house. During the ignition week: 25 percent.&lt;/p&gt;

&lt;p&gt;The phone stopped going places.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zxye51cxp8asbgqwvsn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zxye51cxp8asbgqwvsn.png" alt=" " width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The receipts
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;March 12, 9:19 p.m. ET.&lt;/strong&gt; Anthropic welcome email. "Ship your first commit in 5 minutes."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;March 13, 8:56 p.m. ET.&lt;/strong&gt; First API credit cutoff. I had been running the agent for less than twenty-four hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;March 13, 9:23 p.m. ET.&lt;/strong&gt; $95.63 API credit top-up. Twenty-seven minutes after the cutoff. I did not reflect. I bought more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;March 17, 11:46 a.m. ET.&lt;/strong&gt; The first git commit in the history of any repository I have ever owned. I have shipped code at work, on contract, as a hobbyist. I had never run &lt;code&gt;git init&lt;/code&gt; on my own machine and lived with the result.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;March 17, later that day.&lt;/strong&gt; Apple receipt for Claude Max 20x at $249.99. I pivot from Pro to Max mid-morning on a Monday. The commitment is made with a tap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;March 19.&lt;/strong&gt; 243 steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;March 22, 9:31 p.m. ET.&lt;/strong&gt; Third API cutoff. &lt;strong&gt;March 22, 9:33 p.m. ET.&lt;/strong&gt; Next top-up. Seventy-eight seconds.&lt;/p&gt;

&lt;p&gt;Across the first ten days: $305.52 in API top-ups on top of the $249.99 Max subscription. I spent more on Claude credit that week than I spent on groceries that month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;April 2, 1:50 a.m. ET.&lt;/strong&gt; My aunt, who was in hospice, passed away.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;April 4&lt;/strong&gt;, two days after her death, is the highest-activity Claude Code day in my entire ninety-day record. 23,476 events. 21.7 active hours. I slept 102 minutes. I took one photo. My longest unbroken session ran from April 3 at 8:34 p.m. to April 4 at 11:16 p.m. The session crossed the day-after-her-death barrier without pausing.&lt;/p&gt;

&lt;p&gt;I shipped production code that day that is still running.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;April 5, 9:02 p.m. ET.&lt;/strong&gt; The largest single API credit top-up of the thirty-five days, $106.25, goes through. Three days after my aunt's death.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0y3u6o46szmv3plbg80e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0y3u6o46szmv3plbg80e.png" alt=" " width="800" height="344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I am not going to dramatize any of this. I am listing it because when I say "the loop compounds with grief rather than interrupting for it," these are the receipts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The machine zone
&lt;/h2&gt;

&lt;p&gt;There is a researcher named Natasha Dow Schüll who spent more than a decade inside Las Vegas casinos watching slot machine players. Her book is called &lt;a href="https://a.co/d/04SLvwvA" rel="noopener noreferrer"&gt;&lt;em&gt;Addiction by Design&lt;/em&gt;&lt;/a&gt;. The thing she names is not addiction in the chemical sense. It is a state players call "the zone." A suspension of self, a narrowing of attention, a sense of the outside world fading. The machines are engineered for it. Variable reward schedules, near-misses that register as almost-wins, sensory feedback tuned to a frequency just below conscious attention.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.smithsonianmag.com/science-nature/bf-skinner-the-man-who-taught-pigeons-to-play-ping-pong-and-rats-to-pull-levers-5363946/" rel="noopener noreferrer"&gt;B.F. Skinner demonstrated the mechanism in the 1950s with pigeons and a lever.&lt;/a&gt; Variable-ratio reinforcement — reward coming at unpredictable intervals — produces more persistent behavior than any other schedule. Persistent meaning: the pigeon will keep pressing the lever long after the reward has stopped. Harder to extinguish. More compulsive.&lt;/p&gt;

&lt;p&gt;The Civilization loop is a variable-ratio reinforcement schedule with a progress bar attached. Every turn yields something, some turns yield a great deal, and the next turn is always one click away. The human who plays it is not broken. The human is operating correctly inside a system that was engineered to produce exactly this behavior.&lt;/p&gt;

&lt;p&gt;Claude Code is a variable-ratio reinforcement schedule with a diff attached. Every tool call yields something, some yield the feature you were trying to build, and the next tool call is always one keystroke away.&lt;/p&gt;

&lt;p&gt;I ran the numbers on all 2,009 Claude Code session transcripts on my two machines. The cache-read token share — the fraction of context tokens served from Anthropic's prompt cache instead of a full re-encode — is 98.32 percent across the cohort. During the week my aunt died it was 99.25 percent. The gap between "I want the next action" and "the next action arrives" is whatever the cache-read latency is. My local compute cost per additional turn has rounded to zero.&lt;/p&gt;

&lt;p&gt;The same work at sixty percent of the throughput would have ended with my body still calibrated, my inner circle still met on weekends, my camera roll full of my cat instead of empty, and my April 2 free for my aunt. The same work was available at that rate. The loop is not what made the work happen. The loop is what made me do it at a speed that did damage.&lt;/p&gt;

&lt;p&gt;Both of those things are true and they sit inside the same person and they are not going to resolve into the simpler version.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I wrote next
&lt;/h2&gt;

&lt;p&gt;This was the first of three. The other two are on Substack, since they go further into territory that isn't really dev.to-shaped:&lt;/p&gt;

&lt;p&gt;If you want to read about what the loop did to my closest relationships, that's &lt;strong&gt;&lt;a href="https://niclydon.substack.com/p/the-machine-zone-twenty-eight-times" rel="noopener noreferrer"&gt;Part II: Twenty-eight Times Slower&lt;/a&gt;&lt;/strong&gt;. My median text reply time to the people closest to me went from 1.1 minutes to 31.2 minutes in twelve days. The interesting part wasn't that it dropped. It was the shape — broadcast, not silence.&lt;/p&gt;

&lt;p&gt;If you want to read about what the loop kept building even after I thought it was over — agents that talked like me, a system writing biographies of the person building it — that's &lt;strong&gt;&lt;a href="https://niclydon.substack.com/p/the-machine-zone-the-rate" rel="noopener noreferrer"&gt;Part III: The Rate&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>mentalhealth</category>
      <category>sideprojects</category>
    </item>
    <item>
      <title>I Fixed My LLM OOM Crashes by Shrinking the Draft Model (Speculative Decoding on Real Hardware)</title>
      <dc:creator>Nic Lydon</dc:creator>
      <pubDate>Fri, 01 May 2026 23:27:56 +0000</pubDate>
      <link>https://dev.to/niclydon/i-fixed-my-llm-oom-crashes-by-shrinking-the-draft-model-speculative-decoding-on-real-hardware-1afb</link>
      <guid>https://dev.to/niclydon/i-fixed-my-llm-oom-crashes-by-shrinking-the-draft-model-speculative-decoding-on-real-hardware-1afb</guid>
      <description>&lt;p&gt;The fix was swapping a 4B draft model for a 0.6B one in my speculative decoding config. That's the whole punchline. But the path there touched every assumption I had about how spec decode interacts with VRAM budgets on consumer hardware, so here's the full story.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4B draft → 0.6B draft&lt;/td&gt;
&lt;td&gt;~2 GiB saved, same MoE throughput&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedding parallelism 16 → 8&lt;/td&gt;
&lt;td&gt;~8 GiB freed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Combined&lt;/td&gt;
&lt;td&gt;Dropped from ~97 GiB to ~87.7 GiB, no more OOM&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Spec decode isn't free. You're paying VRAM for both models simultaneously.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I run a local LLM inference gateway on two AMD-based mini PCs — GMKTec EVO-X2 boxes with Strix Halo APUs and 160 GB of unified memory each. The gateway serves around 20 models through &lt;code&gt;llama-swap&lt;/code&gt;, a process manager that loads and evicts models on demand behind an OpenAI-compatible API. Think of it as a poor man's model router: one port per logical model, &lt;code&gt;llama-swap&lt;/code&gt; starts the right &lt;code&gt;llama.cpp&lt;/code&gt; process on request, and idle models get evicted when memory gets tight.&lt;/p&gt;




&lt;h2&gt;
  
  
  Speculative Decoding (Quick Context)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsb7j2fg1ysfnwgkavghu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsb7j2fg1ysfnwgkavghu.png" alt="Speculative decoding diagram" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Speculative decoding pairs a large target model with a smaller draft model. The draft proposes tokens cheaply; the target verifies them in a single forward pass. When the draft is right — and for well-matched model families, it often is — you get roughly 1.5–2× throughput. The important detail that bites people: both models are resident in memory at the same time.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bad Assumption
&lt;/h2&gt;

&lt;p&gt;I was running a blanket policy: every Qwen3-family model gets the Qwen3-4B draft. Four billion parameters felt like the safe middle ground — big enough to draft well, small enough to fit. Or so I thought.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Crash
&lt;/h2&gt;

&lt;p&gt;The problem surfaced when I tried to load &lt;code&gt;qwen3.5-122b-a10b&lt;/code&gt; (roughly 71 GiB at Q4_K_M) alongside my always-resident embedding model. On paper, the embedding model was supposed to run around 16 GiB. In practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;embed:              ~23.8 GiB
122B + 4B draft:   ~73.6 GiB
─────────────────────────────
total:             ~97.6 GiB
available:         ~96.0 GiB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Intermittent OOM crashes followed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Diagnosis
&lt;/h2&gt;

&lt;p&gt;Pulling real numbers from &lt;code&gt;rocm-smi&lt;/code&gt; told a different story than my estimates. The embedding model was actually consuming 23.8 GiB, not 16. The culprit was KV cache pre-allocation: with parallelism set to 16 and context at 8,192 tokens, the runtime was pre-allocating 16 full-context-length KV cache slots simultaneously, and that adds up fast.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Knobs, Both Pulled
&lt;/h2&gt;

&lt;p&gt;At that point I had two levers: reduce embedding parallelism, or shrink the draft model. I did both.&lt;/p&gt;

&lt;p&gt;Dropping embedding parallelism from 16 to 8 freed roughly 8 GiB while keeping context length at 8,192 tokens, which still comfortably covers my p99 usage around 2,532 tokens. On the draft side, the key insight was that not every model needs the same draft. A 0.6B draft — about 0.4 GiB — performs nearly as well as the 4B for MoE architectures, where sparse activation already limits how much a larger draft model can contribute. Total consumption dropped from roughly 97 GiB to around 87.7 GiB. Stable, no crashes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs8wlwfue6kkom7cibroi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs8wlwfue6kkom7cibroi.png" alt="VRAM usage after fix" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Measure actual VRAM usage, not estimated usage. They are not the same number.&lt;/li&gt;
&lt;li&gt;Draft model sizing should follow model architecture, not a one-size-fits-all policy.&lt;/li&gt;
&lt;li&gt;KV cache pre-allocation scales with parallelism — and it will surprise you.&lt;/li&gt;
&lt;li&gt;Spec decode costs memory. Budget for two models, not one.&lt;/li&gt;
&lt;li&gt;Working inside tight constraints forces you to understand your system at a level that comfortable headroom never would.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
