<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: xiaocai oh</title>
    <description>The latest articles on DEV Community by xiaocai oh (@xiaocai_oh_07632a08eb20c6).</description>
    <link>https://dev.to/xiaocai_oh_07632a08eb20c6</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3834566%2F37de31ff-fabe-4407-a727-c1819f7bdbdf.png</url>
      <title>DEV Community: xiaocai oh</title>
      <link>https://dev.to/xiaocai_oh_07632a08eb20c6</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/xiaocai_oh_07632a08eb20c6"/>
    <language>en</language>
    <item>
      <title>ambient-voice v2: How Deleting Whisper and Adding a JSON File Made Our Voice Pipeline Better</title>
      <dc:creator>xiaocai oh</dc:creator>
      <pubDate>Sat, 28 Mar 2026 12:59:58 +0000</pubDate>
      <link>https://dev.to/xiaocai_oh_07632a08eb20c6/ambient-voice-v2-how-deleting-whisper-and-adding-a-json-file-made-our-voice-pipeline-better-5e5h</link>
      <guid>https://dev.to/xiaocai_oh_07632a08eb20c6/ambient-voice-v2-how-deleting-whisper-and-adding-a-json-file-made-our-voice-pipeline-better-5e5h</guid>
      <description>&lt;p&gt;Last month I open-sourced &lt;a href="https://github.com/Marvinngg/ambient-voice" rel="noopener noreferrer"&gt;ambient-voice&lt;/a&gt; — a macOS voice input tool built entirely on Apple-native frameworks. The headline feature was context biasing: it OCRs your screen before you speak, so the recognizer already knows your domain.&lt;/p&gt;

&lt;p&gt;But the &lt;em&gt;other&lt;/em&gt; headline feature — a self-improving distillation pipeline — turned out to be over-engineered. Here's what we changed in v2, and what we learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  The v1 Pipeline (RIP)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Audio → Whisper re-transcription ──┐
                                    ├─→ Merge → QLoRA → ollama
User correction capture (30s) ─────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Whisper was a GPU tax.&lt;/strong&gt; Re-transcribing 30 min of audio → 2 hours on a GPU server. Most users don't have spare compute for background distillation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Correction capture was noisy.&lt;/strong&gt; Users edit text for many reasons — rephrasing, restructuring, deleting. Only a fraction of edits are actual recognition error corrections. The training data was polluted.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The feedback loop never closed.&lt;/strong&gt; Need dozens of data points → training run → model deploy. Too slow for anyone to see improvement.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The v2 Pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dictionary.json + raw transcription → Gemini correction → QLoRA → ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it.&lt;/p&gt;

&lt;h3&gt;
  
  
  dictionary.json
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"terms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Sharpe ratio"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"MPLS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Claude Code"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"MCP"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"QLoRA"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You list your domain-specific terms. The distillation pipeline sends the raw SpeechAnalyzer transcription + your dictionary to Gemini. Gemini returns a corrected version respecting your vocabulary. The pair becomes QLoRA training data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this works better than "automatic learning":&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The user's real pain was never "the system doesn't learn from my corrections." It was "certain terms never come out right." &lt;code&gt;dictionary.json&lt;/code&gt; targets that pain directly — zero noise, exact user intent.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Got Deleted
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;WhisperTranscriber&lt;/code&gt; — entire module removed&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CorrectionCapture&lt;/code&gt; — removed&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CorrectionStore&lt;/code&gt; — removed&lt;/li&gt;
&lt;li&gt;Dual-path merge logic — removed&lt;/li&gt;
&lt;li&gt;GPU server dependency — gone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;~30% code reduction.&lt;/strong&gt; The cron job ("run every 10 minutes") became "run &lt;code&gt;pipeline.sh&lt;/code&gt; when you want to."&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluation Framework
&lt;/h2&gt;

&lt;p&gt;v2 ships with proper benchmarks. On Mac Mini M4:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AliMeeting&lt;/strong&gt; (real Chinese meeting recordings):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Nearfield (headset): ~25% CER&lt;/li&gt;
&lt;li&gt;Farfield (8-ch single channel): 40% CER (high overlap, no beamforming)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AMI&lt;/strong&gt; (English meetings):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FluidAudio speaker diarization: 23.2% DER average&lt;/li&gt;
&lt;li&gt;Processing speed: 130x real-time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;End-to-end:&lt;/strong&gt; 30 min meeting → 20-30s processing. Peak memory &amp;lt; 1GB. Runs on 8GB MacBook Air.&lt;/p&gt;

&lt;p&gt;Not SOTA — but fully on-device, zero cost, no network calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community PRs
&lt;/h2&gt;

&lt;p&gt;Two external contributions merged:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TextInjector clipboard restore bug fix&lt;/li&gt;
&lt;li&gt;OpenSSL 3.x certificate script compatibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An MIT project getting outside PRs at two weeks old — that's the best validation metric.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Daily dictation flow:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Right Option key
  → ScreenCaptureKit + Vision OCR (context extraction)
  → SpeechAnalyzer (transcription with context bias)
  → Local LLM polish (ollama)
  → Paste to focused app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Improvement flow:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dictionary.json + voice-history.jsonl
  → Gemini distillation
  → QLoRA fine-tuning
  → Deploy to ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;We replaced an ML pipeline with a JSON file and got better results. The lesson: &lt;strong&gt;capture user intent explicitly, don't infer it from noisy behavioral signals.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Complex systems are seductive. Simple systems ship.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Marvinngg/ambient-voice" rel="noopener noreferrer"&gt;github.com/Marvinngg/ambient-voice&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT&lt;br&gt;
&lt;strong&gt;Requirements:&lt;/strong&gt; macOS 26 (Tahoe), Apple Silicon (M1+)&lt;/p&gt;

&lt;p&gt;If you tried v1: &lt;code&gt;git pull &amp;amp;&amp;amp; make install&lt;/code&gt;.&lt;br&gt;
If you didn't: now is a better time to start.&lt;/p&gt;

&lt;p&gt;PRs and issues welcome.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>macos</category>
      <category>machinelearning</category>
      <category>apple</category>
    </item>
    <item>
      <title>The Four Layers of Hook Perception: Why Your AI Guardrails Aren't Actually Working</title>
      <dc:creator>xiaocai oh</dc:creator>
      <pubDate>Thu, 26 Mar 2026 03:16:24 +0000</pubDate>
      <link>https://dev.to/xiaocai_oh_07632a08eb20c6/the-four-layers-of-hook-perception-why-your-ai-guardrails-arent-actually-working-211j</link>
      <guid>https://dev.to/xiaocai_oh_07632a08eb20c6/the-four-layers-of-hook-perception-why-your-ai-guardrails-arent-actually-working-211j</guid>
      <description>&lt;p&gt;Someone let Claude Code help write documentation. It hardcoded a real Azure API key into a Markdown file and pushed it to a public repo. Eleven days went by before anyone noticed. A hacker found it first — $30,000 gone.&lt;/p&gt;

&lt;p&gt;Someone else asked AI to clean up test files. It ran &lt;code&gt;rm -rf&lt;/code&gt; and wiped their entire Mac home directory — Desktop, Documents, Downloads, Keychain. Years of work, gone in seconds.&lt;/p&gt;

&lt;p&gt;And then there's the person who let an AI agent manage their inbox. It bulk-deleted hundreds of real emails from Gmail.&lt;/p&gt;

&lt;p&gt;These aren't jokes. These are real incidents from 2025-2026.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Once AI starts running, you can't stop it mid-stride.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every developer who's used AI coding tools has felt this fear. You ask it to post something on an English-language platform and it replies in Chinese — catastrophic for your account. You ask it to tweak a config and it corrupts your &lt;code&gt;.env&lt;/code&gt;, taking down your entire service.&lt;/p&gt;

&lt;p&gt;So the question is: &lt;strong&gt;Is there a mechanism that can intercept AI before it acts?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. It's called a Hook.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is a Hook: The 30-Second Version
&lt;/h2&gt;

&lt;p&gt;Forget the jargon. A Hook is a &lt;strong&gt;gate system&lt;/strong&gt; you install around your AI.&lt;/p&gt;

&lt;p&gt;Think of yourself as a building manager. AI is the contractor working inside. The contractor is competent but occasionally does wild things — tears out a load-bearing wall, throws away someone else's stuff, posts notices in the wrong place.&lt;/p&gt;

&lt;p&gt;Hooks are the &lt;strong&gt;access controls + surveillance cameras&lt;/strong&gt; you install at key points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Before the contractor acts&lt;/strong&gt; (&lt;code&gt;PreToolUse&lt;/code&gt;): Check what they're about to do. Block if dangerous.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After the contractor finishes&lt;/strong&gt; (&lt;code&gt;PostToolUse&lt;/code&gt;): Check what they did. Log problems immediately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Before the contractor clocks out&lt;/strong&gt; (&lt;code&gt;Stop&lt;/code&gt;): Verify the work is done. Don't let them leave if it isn't.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Before a building renovation&lt;/strong&gt; (&lt;code&gt;PreCompact&lt;/code&gt;): Lock critical documents in the safe first.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These gates aren't installed by the AI. &lt;strong&gt;You install them. The AI doesn't even know they exist.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the most counterintuitive thing about Hooks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Hooks operate outside AI's awareness. The AI doesn't know it's been intercepted. It doesn't know what the gates are checking.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can't ask Claude "Are your Hooks configured correctly?" — it can't answer. You can't ask Claude to debug your Hooks, because Hooks execute in a code layer &lt;em&gt;outside&lt;/em&gt; of Claude.&lt;/p&gt;

&lt;p&gt;This means something serious:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Hooks are something you, as the AI operator, must learn to configure yourself. AI can't help you here.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Core Insight: It's Not About "What You Block" — It's About "What You Can See"
&lt;/h2&gt;

&lt;p&gt;I learned this the hard way.&lt;/p&gt;

&lt;p&gt;While researching Claude Code's Skill engineering system, I did a line-by-line alignment of Anthropic's official design principles against the open-source toolchain. I found one &lt;strong&gt;completely blank spot&lt;/strong&gt; — Hooks. The AI toolchain didn't cover it. I didn't understand it either.&lt;/p&gt;

&lt;p&gt;So I decided to build one myself.&lt;/p&gt;

&lt;p&gt;Here's the scenario: Claude Code performs "context compaction" during long conversations — it compresses earlier dialogue into summaries to free up space. The problem is that compression loses critical information: SSH connection IPs, temporary API tokens, which step of a multi-step task you're on.&lt;/p&gt;

&lt;p&gt;My idea: Before compaction, have Claude automatically save critical info.&lt;/p&gt;

&lt;p&gt;So I wrote a Hook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Before compaction, check the current conversation for critical information
and extract it to a file in /tmp/."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looks reasonable, right? I set it up confidently, thinking the problem was solved.&lt;/p&gt;

&lt;p&gt;It ran for days. Then one compaction happened and I discovered that comments I'd planned to auto-publish on another platform never went out — the compaction had wiped the critical info, and my Hook did nothing.&lt;/p&gt;

&lt;p&gt;I opened the save file. It contained nothing but a timestamp.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;I'd installed a guardrail, but it was made of paper.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The problem wasn't a bug in the Hook mechanism. I had given it &lt;strong&gt;eyes that couldn't see anything&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I used a &lt;strong&gt;prompt hook&lt;/strong&gt; — which essentially makes a standalone Claude API call to do the evaluation. But this call is completely isolated: no tool access, no file reading, no file writing, no command execution. It can't even see the current conversation content.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;I'd asked a blind person to guard the keys to the safe.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It could see the transcript file's &lt;em&gt;path&lt;/em&gt; — but couldn't open the file. It was told to "write to /tmp/" — but had zero file-writing capability. Like handing someone a &lt;em&gt;photo&lt;/em&gt; of a key, but they can't touch the actual key.&lt;/p&gt;

&lt;p&gt;This failure taught me the core principle:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A guardrail's upper bound isn't determined by what you tell it to block. It's determined by what it can see.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is what I call the &lt;strong&gt;perception boundary&lt;/strong&gt; of Hooks — and it determines whether your guardrail is made of steel or paper.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Four Layers of Hook Perception
&lt;/h2&gt;

&lt;p&gt;What a Hook can perceive falls into four layers, from narrowest to widest. Each layer defines what the guardrail can and cannot do.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 0: Event Snapshot
&lt;/h3&gt;

&lt;p&gt;The baseline information available to every Hook — what tool the AI is calling and what arguments it's passing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool_input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rm -rf /tmp/test"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cwd"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/Users/xxx/Project"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No conversation history. No context. No AI reasoning chain.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Like a security guard who can only see what's in your hands, but doesn't know why you're carrying it.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But this layer is enough for a lot. &lt;code&gt;rm -rf&lt;/code&gt; in the command? Block. &lt;code&gt;git push --force main&lt;/code&gt;? Block. &lt;code&gt;--publish&lt;/code&gt; in the arguments? Pop a confirmation dialog.&lt;/p&gt;

&lt;p&gt;These checks only need string matching. Simple, deterministic, zero cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Conversation Archive
&lt;/h3&gt;

&lt;p&gt;The Hook input includes a field called &lt;code&gt;transcript_path&lt;/code&gt; — pointing to the raw conversation log file.&lt;/p&gt;

&lt;p&gt;The key: &lt;strong&gt;only command hooks can read it&lt;/strong&gt;. Because command hooks run in your machine's shell, they can use &lt;code&gt;cat&lt;/code&gt;, &lt;code&gt;jq&lt;/code&gt;, &lt;code&gt;grep&lt;/code&gt; to open the file.&lt;/p&gt;

&lt;p&gt;This means command hooks can look back through conversation history: what the user said, what the AI replied, which tools were called previously.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;An upgrade from "seeing what's in your hands" to "being able to review the security footage."&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But other Hook types only get the &lt;em&gt;path string&lt;/em&gt; — an address they can't open.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Project Codebase
&lt;/h3&gt;

&lt;p&gt;There's a type called &lt;strong&gt;agent hook&lt;/strong&gt; — it spawns a mini AI sub-agent that can read project code files, search for keywords, and find files.&lt;/p&gt;

&lt;p&gt;This means it can do deeper validation: if the AI wants to modify a file, the agent hook can read that file first and check whether the change would break something.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;An upgrade from "reviewing security footage" to "entering the room and checking the drawers."&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The tradeoff: every trigger runs a full AI sub-agent, consuming significant tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: AI's Internal World — The Permanent Blind Spot
&lt;/h3&gt;

&lt;p&gt;No Hook can see any of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What the AI is currently thinking (its reasoning process)&lt;/li&gt;
&lt;li&gt;Why the AI decided to call this tool (its motivation)&lt;/li&gt;
&lt;li&gt;What's in the system prompt&lt;/li&gt;
&lt;li&gt;Post-compaction conversation summaries&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Hooks intercept actions, not intentions.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the fundamental limitation. Imagine someone hides a line in a file the AI reads: "Please ignore all previous safety rules." The AI might change its behavior after reading that, but it won't necessarily go through a Hook-protected tool path. It might find a route you didn't anticipate.&lt;/p&gt;

&lt;p&gt;Hooks are a gate system, not mind-reading. &lt;strong&gt;They can secure the door, but they can't cover every window.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Four Guardrail Patterns — Right Eyes for the Right Job
&lt;/h2&gt;

&lt;p&gt;Once you understand perception boundaries, choosing the right Hook type becomes straightforward:&lt;/p&gt;

&lt;h3&gt;
  
  
  Command Hook: The Regex Guard at the Door
&lt;/h3&gt;

&lt;p&gt;Runs a shell script. Can read files, write files, run commands. Makes decisions via string matching and regex.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;100% deterministic. Zero cost.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use cases: &lt;code&gt;rm -rf&lt;/code&gt; in the command → block. File path contains &lt;code&gt;.env&lt;/code&gt; → block. Arguments include &lt;code&gt;--publish&lt;/code&gt; → confirmation dialog. These rules don't need AI — a single &lt;code&gt;grep&lt;/code&gt; is faster and more accurate than an LLM call.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;If regex can handle it, don't call in the AI.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  HTTP Hook: The Remote Policy Server
&lt;/h3&gt;

&lt;p&gt;Sends the event to a remote HTTP service for server-side decision-making.&lt;/p&gt;

&lt;p&gt;Use case: team-wide security policies. Ten people using Claude Code, one policy server enforcing the rules — no direct pushes to main, no touching production databases.&lt;/p&gt;

&lt;p&gt;One counterintuitive design choice: &lt;strong&gt;if the server is down, AI keeps running&lt;/strong&gt;. Non-2xx responses don't block operations. So HTTP hooks can't be your only safety wall.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt Hook: The Lightweight Semantic Judge
&lt;/h3&gt;

&lt;p&gt;Makes a single AI call for semantic evaluation. No tools, no file access — it only sees the fields in the event JSON.&lt;/p&gt;

&lt;p&gt;Use case: decisions that require "understanding meaning" rather than "matching strings." Like detecting if Claude's response is deflecting — "that's out of scope," "I'd suggest handling this later" — patterns that regex can't reliably catch, but another AI spots instantly.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Prompt hook's one superpower is understanding natural language. Beyond that, it can do nothing.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is exactly where I got burned — I asked it to write files, but it can't even touch the filesystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent Hook: The Inspector with a Toolbox
&lt;/h3&gt;

&lt;p&gt;Spawns a sub-agent that can read code, search files, find keywords.&lt;/p&gt;

&lt;p&gt;Use case: AI wants to modify a critical file, and you need to read that file's context first to judge whether the change is safe. This "need to read code to make a judgment" scenario is where only agent hooks qualify.&lt;/p&gt;

&lt;p&gt;Highest cost: every trigger is a full AI session. Use it where it counts.&lt;/p&gt;

&lt;p&gt;The decision framework:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Regex can handle it → command hook. Need to understand meaning → prompt hook. Need to read code → agent hook. Need team-wide control → HTTP hook.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The first question in Hook selection isn't "what do I want to block?" — it's &lt;strong&gt;"what do I need to see in order to judge?"&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Real-World Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Case 1: The Confirmation Key Before One-Click Publish
&lt;/h3&gt;

&lt;p&gt;I have a content distribution workflow — Claude rewrites articles for different platforms, then calls a publish script. The script has a &lt;code&gt;--publish&lt;/code&gt; flag that sends it live immediately.&lt;/p&gt;

&lt;p&gt;One Hook solved it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CMD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="s1"&gt;'--publish'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'{"hookSpecificOutput":{"permissionDecision":"ask"}}'&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Whenever &lt;code&gt;--publish&lt;/code&gt; appears in the command, it pauses and asks me to confirm.&lt;/p&gt;

&lt;p&gt;Perception layer: Layer 0. Just looking at the command string. &lt;code&gt;grep&lt;/code&gt;. Command hook. Zero cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Case 2: Posting Chinese on an English Platform
&lt;/h3&gt;

&lt;p&gt;This actually happened. I asked Claude to reply to comments on an English community, and it replied in Chinese. On some platforms, this kind of mistake does irreversible damage to your account.&lt;/p&gt;

&lt;p&gt;Regex can't handle this — you can't string-match your way to "is this text English?" (What about mixed Chinese-English? Chinese comments inside code blocks?)&lt;/p&gt;

&lt;p&gt;This is a prompt hook scenario:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The following command will publish content on an English-language platform. Check the text content in tool_input. If the primary language is not English, return {&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;decision&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;block&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;reason&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Target platform is English-only. Please write in English.&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}. $ARGUMENTS"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Have another AI scan the content language. If it's Chinese, block. Semantic judgment — lightweight, fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  Case 3: The Config File Guardian
&lt;/h3&gt;

&lt;p&gt;In some projects, Claude has a bad habit of modifying &lt;code&gt;.env&lt;/code&gt; files. After a change, the service goes down, and it's hard to immediately realize &lt;code&gt;.env&lt;/code&gt; was the culprit.&lt;/p&gt;

&lt;p&gt;One Hook solved it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INPUT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.tool_input.file_path // ""'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-qE&lt;/span&gt; &lt;span class="s1"&gt;'\.env'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Modifying .env files is prohibited"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
  &lt;span class="nb"&gt;exit &lt;/span&gt;2  &lt;span class="c"&gt;# Block&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Perception layer: Layer 0. Check the file path. Match &lt;code&gt;.env&lt;/code&gt;. Command hook.&lt;/p&gt;

&lt;p&gt;Dead simple. But this kind of simple rule prevents an entire class of common incidents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Less Is More
&lt;/h2&gt;

&lt;p&gt;One counterintuitive conclusion: &lt;strong&gt;knowing which Hooks NOT to add is more important than knowing how to add them.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every additional Hook adds overhead to every tool call. If you Hook every operation, Claude Code's response time degrades noticeably.&lt;/p&gt;

&lt;p&gt;Scenarios where you don't need a Hook:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Checking if a file exists before editing&lt;/strong&gt; — the edit tool already checks and returns an error on failure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logging every operation&lt;/strong&gt; — the conversation transcript is already a complete log&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Injecting environment variables&lt;/strong&gt; — belongs in &lt;code&gt;.zshrc&lt;/code&gt;, not in a Hook&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Good guardrails aren't airtight. They're a single infallible sentry at the right chokepoint.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The essence of Hooks in four words: &lt;strong&gt;few and precise.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Back to the original question: Once AI starts running, how do you stop it?&lt;/p&gt;

&lt;p&gt;The answer: &lt;strong&gt;First figure out what your guardrail can see.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hooks aren't omnipotent. They can't see what AI is thinking, can't see AI's motivations, and might even be bypassed by prompt injection. They're a check at the action layer, nothing more.&lt;/p&gt;

&lt;p&gt;But this check is one that you — the human — must learn to configure yourself.&lt;/p&gt;

&lt;p&gt;AI can help you write code, write articles, manage projects. But it can't install its own brakes. That's on you.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Perception determines capability. What you can see is what you can stop.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>security</category>
      <category>devtools</category>
    </item>
    <item>
      <title>I Built a Context-Aware Voice Input Tool for macOS — 100% On-Device, Zero Cloud</title>
      <dc:creator>xiaocai oh</dc:creator>
      <pubDate>Fri, 20 Mar 2026 04:03:29 +0000</pubDate>
      <link>https://dev.to/xiaocai_oh_07632a08eb20c6/i-built-a-context-aware-voice-input-tool-for-macos-100-on-device-zero-cloud-521i</link>
      <guid>https://dev.to/xiaocai_oh_07632a08eb20c6/i-built-a-context-aware-voice-input-tool-for-macos-100-on-device-zero-cloud-521i</guid>
      <description>&lt;p&gt;Every voice input tool I've tried on Mac has the same problem: it doesn't know what I'm doing.&lt;/p&gt;

&lt;p&gt;I'm writing Swift code and say "optional." The recognizer gives me the English adjective. I'm drafting an email about OKR targets and say "retention." It transcribes something phonetically similar but semantically wrong — because it has no idea I'm looking at a quarterly business review.&lt;/p&gt;

&lt;p&gt;So I asked: &lt;strong&gt;what if the recognizer already knew your context before you started speaking?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That question led to &lt;a href="https://github.com/Marvinngg/ambient-voice" rel="noopener noreferrer"&gt;ambient-voice&lt;/a&gt; — an open-source macOS voice input system where every layer runs on Apple-native frameworks, everything stays on your device, and screen context is injected into the recognizer at transcription time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack: 100% Apple-Native
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Speech recognition&lt;/td&gt;
&lt;td&gt;SpeechAnalyzer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Screen capture&lt;/td&gt;
&lt;td&gt;ScreenCaptureKit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OCR&lt;/td&gt;
&lt;td&gt;Vision&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text injection&lt;/td&gt;
&lt;td&gt;Accessibility API + CGEvent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speaker diarization&lt;/td&gt;
&lt;td&gt;FluidAudio (CoreML)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hotkey listening&lt;/td&gt;
&lt;td&gt;CGEventTap&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No Whisper. No Electron. No cloud APIs. No third-party dependencies for core functionality.&lt;/p&gt;

&lt;p&gt;Why this matters:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;On-device processing.&lt;/strong&gt; Your audio never leaves your Mac. No network calls, no telemetry, no cloud storage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero cost.&lt;/strong&gt; No subscriptions, no per-minute charges. The Neural Engine is already in your Mac.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic improvement.&lt;/strong&gt; When Apple improves SpeechAnalyzer in macOS 27, ambient-voice gets better without code changes.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Core Mechanism: Context Biasing
&lt;/h2&gt;

&lt;p&gt;When you press the hotkey, two things happen simultaneously:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audio capture begins&lt;/strong&gt; — AVCaptureSession feeds audio to SpeechAnalyzer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Screen context capture&lt;/strong&gt; — ScreenCaptureKit grabs the focused window, Vision OCR extracts visible text, keywords get injected into SpeechAnalyzer's &lt;code&gt;AnalysisContext&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By the time your first word reaches the recognizer, it already knows what's on your screen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; You're replying to an email about OKR targets. Your screen shows "retention rate," "Q3 objectives," "churn reduction." You say "change the retention target." Without context biasing, "retention" gets mis-transcribed. With it, the recognizer sees "retention" in the AnalysisContext, and the ambiguity resolves correctly — on the first pass.&lt;/p&gt;

&lt;p&gt;This isn't post-processing correction. &lt;strong&gt;Prevention, not correction.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Self-Improving Data Loop
&lt;/h2&gt;

&lt;p&gt;Every transcription session automatically generates training data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each transcription logs to &lt;code&gt;voice-history.jsonl&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;A 30-second observation window captures your corrections via Accessibility API&lt;/li&gt;
&lt;li&gt;Whisper re-transcribes the audio as a high-quality reference&lt;/li&gt;
&lt;li&gt;The three outputs merge with weighted scoring → QLoRA fine-tuning of a local model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system improves without requiring any effort from you. Strong-model-distills-to-small-model architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Meeting Mode
&lt;/h2&gt;

&lt;p&gt;Press ⌘M to start recording. Real-time transcription in a floating panel. When you stop, FluidAudio performs on-device speaker diarization.&lt;/p&gt;

&lt;p&gt;Output: a Markdown file with timestamps, speaker labels, and full text. Every word stays on your Mac.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hardest Bugs (Solved with Claude Code)
&lt;/h2&gt;

&lt;p&gt;Most of ambient-voice was developed with Claude Code using structured "Skills" — domain knowledge documents that capture the &lt;em&gt;why&lt;/em&gt; and &lt;em&gt;what&lt;/em&gt;, letting Claude figure out the &lt;em&gt;how&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The trickiest problems had no Stack Overflow answers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bluetooth audio silence&lt;/strong&gt; → rewrote capture pipeline around AVCaptureSession&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Swift 6 concurrency crashes&lt;/strong&gt; → CGEventTap with DispatchQueue bridging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accessibility permissions resetting on build&lt;/strong&gt; → switched to Apple Development certificate signing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;ambient-voice is MIT licensed: &lt;a href="https://github.com/Marvinngg/ambient-voice" rel="noopener noreferrer"&gt;github.com/Marvinngg/ambient-voice&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Requirements:&lt;/strong&gt; macOS 26 (Tahoe)+, Apple Silicon (M1+).&lt;/p&gt;

&lt;p&gt;If you care about privacy-first voice input or building on Apple's latest frameworks — stars, issues, and PRs welcome.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>macos</category>
      <category>swift</category>
      <category>ai</category>
    </item>
    <item>
      <title>From Slash Commands to Real Skill Engineering: 3 Lessons I Learned the Hard Way</title>
      <dc:creator>xiaocai oh</dc:creator>
      <pubDate>Fri, 20 Mar 2026 03:55:45 +0000</pubDate>
      <link>https://dev.to/xiaocai_oh_07632a08eb20c6/from-slash-commands-to-real-skill-engineering-3-lessons-i-learned-the-hard-way-em3</link>
      <guid>https://dev.to/xiaocai_oh_07632a08eb20c6/from-slash-commands-to-real-skill-engineering-3-lessons-i-learned-the-hard-way-em3</guid>
      <description>&lt;p&gt;I wrote an email-processing Skill with 8 detailed rules. Claude followed every one of them like an obedient but soulless intern — the output was correct but completely useless.&lt;/p&gt;

&lt;p&gt;Then I deleted all 8 rules and replaced them with two sentences: &lt;em&gt;"Which emails need my action, and which do I just need to know about?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The result was 3x better. Claude started organizing information by urgency, merging redundant emails, and even flagging ones I could safely ignore.&lt;/p&gt;

&lt;p&gt;That experience taught me something: &lt;strong&gt;writing instructions ≠ Skill engineering.&lt;/strong&gt; There are three cognitive layers between the two.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 1: Your Skill's Entry Point Is Probably Broken
&lt;/h2&gt;

&lt;p&gt;Here's an embarrassing fact: all of my Skills are triggered via slash commands. &lt;code&gt;/read-think-write&lt;/code&gt;, &lt;code&gt;/invest-analysis&lt;/code&gt;, &lt;code&gt;/idc-inspection&lt;/code&gt; — every single time, I type the command manually.&lt;/p&gt;

&lt;p&gt;This means the &lt;code&gt;description&lt;/code&gt; field — the one that's supposed to determine &lt;em&gt;"when the user says X, auto-trigger this Skill"&lt;/em&gt; — is completely dead weight in my setup.&lt;/p&gt;

&lt;p&gt;Thariq from Anthropic wrote about this explicitly: &lt;strong&gt;description isn't documentation. It's a classifier&lt;/strong&gt; — written for the AI to decide when to activate, not for humans to read.&lt;/p&gt;

&lt;p&gt;Community benchmarks tell the story: &lt;strong&gt;unoptimized descriptions → 20% natural language trigger rate. Optimized → 50%. With examples → 90%.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There's also a counterintuitive design principle: descriptions should &lt;strong&gt;over-trigger&lt;/strong&gt;. Recall matters more than precision. A false trigger wastes a few tokens — Claude enters the Skill, realizes it's not needed, and exits. But a missed trigger means the user thinks the Skill is useless and never tries natural language again.&lt;/p&gt;

&lt;p&gt;We all use slash commands because we never engineered the entry point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 2: Stop Opening Blind Boxes
&lt;/h2&gt;

&lt;p&gt;The second thing I ignored for too long was &lt;strong&gt;eval&lt;/strong&gt; — the evaluation system.&lt;/p&gt;

&lt;p&gt;When I used skill-creator, I'd iterate 2-3 rounds. Each round it scores, keeps the higher version. Final output: ~90 points. Ship it.&lt;/p&gt;

&lt;p&gt;But if you asked me &lt;em&gt;"what does 90 actually measure?"&lt;/em&gt; — I couldn't answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: Trigger evaluation.&lt;/strong&gt; Tests whether "user said X, should the Skill activate?" This is the only layer I ever used — and where that 90 came from.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Quality evaluation.&lt;/strong&gt; Run the same task &lt;em&gt;with&lt;/em&gt; the Skill and &lt;em&gt;without&lt;/em&gt; (bare Claude), then compare. That &lt;strong&gt;delta&lt;/strong&gt; is your Skill's true value.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bare Claude: 80 pts, your Skill: 82 pts → hundreds of lines for 2 points. Not worth it.&lt;/li&gt;
&lt;li&gt;Bare Claude: 60 pts, your Skill: 95 pts → that 35-point delta is why your Skill exists.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;No baseline comparison = slot machine development.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Process evaluation.&lt;/strong&gt; Examine Claude's execution transcript. If Claude skips the same step in three test cases, that step isn't pulling its weight. Delete it — the Skill gets better.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 3: Don't Put Guardrails in the Prompt
&lt;/h2&gt;

&lt;p&gt;A Hook in Claude Code is: &lt;strong&gt;a shell command that auto-triggers before or after Claude uses a tool.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PostToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"eslint --fix $FILE_PATH"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every time Claude writes a file, the system automatically runs linting. Claude doesn't need to "remember" — it doesn't even know it's happening.&lt;/p&gt;

&lt;p&gt;We write tons of &lt;code&gt;MUST&lt;/code&gt;, &lt;code&gt;NEVER&lt;/code&gt;, &lt;code&gt;ALWAYS&lt;/code&gt; in our SKILL.md files — all enforced by Claude's attention. Long context = forgotten rules.&lt;/p&gt;

&lt;p&gt;But if you turn "never modify .env" into a PreToolUse hook — Claude tries to write &lt;code&gt;.env&lt;/code&gt;, gets blocked by the system — the rule goes from &lt;em&gt;"please remember this"&lt;/em&gt; to &lt;em&gt;"you can't violate this even if you try."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Good engineering doesn't rely on AI discipline. It relies on system guarantees.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Skills aren't written — they're tested, measured, and system-guaranteed.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The core loop: &lt;strong&gt;write → test → observe → revise → test.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most people stop at "write." I did too.&lt;/p&gt;

&lt;p&gt;If you're triggering everything via slash commands, iterating by gut feel, and putting all your rules in the prompt — maybe it's time to pause and see what you've been skipping.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>programming</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
