<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Errata Hunter</title>
    <description>The latest articles on DEV Community by Errata Hunter (@erratahunter).</description>
    <link>https://dev.to/erratahunter</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3797510%2F78826cc2-a514-4c4b-8e2f-a575c16e80f4.png</url>
      <title>DEV Community: Errata Hunter</title>
      <link>https://dev.to/erratahunter</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/erratahunter"/>
    <language>en</language>
    <item>
      <title>How to Actually Use AI Coding Agents — 6 Skill-Specific Tips</title>
      <dc:creator>Errata Hunter</dc:creator>
      <pubDate>Thu, 23 Apr 2026 21:35:11 +0000</pubDate>
      <link>https://dev.to/erratahunter/how-to-actually-use-ai-coding-agents-6-skill-specific-tips-4ohe</link>
      <guid>https://dev.to/erratahunter/how-to-actually-use-ai-coding-agents-6-skill-specific-tips-4ohe</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The real problem with AI coding is not model quality but &lt;strong&gt;which stage you hand what to&lt;/strong&gt; — a nonexistent &lt;code&gt;CONFIG_SPI_NRFX_SPIM3&lt;/code&gt; in Zephyr passing the build and then bricking boot is the proof.&lt;/li&gt;
&lt;li&gt;The fix is not a loop but &lt;strong&gt;gates&lt;/strong&gt; — split the work into Research → Fact-Check → Plan → Fact-Check → Implement → Debug → Review, and run Fact-Check twice in &lt;strong&gt;independent sessions&lt;/strong&gt;, right after research and right after planning.&lt;/li&gt;
&lt;li&gt;Separate each gate into skill (instruction) and hook (contract) — put deterministic checks like banning &lt;code&gt;any&lt;/code&gt;/&lt;code&gt;void*&lt;/code&gt; into hooks such as &lt;code&gt;auto-typecheck.sh&lt;/code&gt;, and keep personas and prompt patterns in SKILL.md. The whole pipeline then compounds.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  1. Where Vibe Coding Breaks
&lt;/h2&gt;

&lt;p&gt;Last year I was adding one more BLE sensor node to a Zephyr-based nRF52 firmware. I threw a one-liner at Claude Code — "enable the SPI driver" — and a single clean line landed in &lt;code&gt;prj.conf&lt;/code&gt;: &lt;code&gt;CONFIG_SPI_NRFX_SPIM3=y&lt;/code&gt;. The build passed. The binary flashed. The board would not boot. Thirty minutes of digging later, it hit me — &lt;strong&gt;that symbol does not exist anywhere in Zephyr&lt;/strong&gt;. The correct answer on this chip family is &lt;code&gt;CONFIG_SPI_NRFX_SPIM&lt;/code&gt; plus a Devicetree node activation. The symbol the AI had synthesized was silently dropped by the Kconfig parser with a single "unknown symbol, ignoring" warning, buried somewhere in 800 lines of build log.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://simonwillison.net/2025/Mar/2/hallucinations-in-code/" rel="noopener noreferrer"&gt;Simon Willison&lt;/a&gt; wrote in March 2025 that "hallucinations in code are the least dangerous form of LLM mistakes." The reasoning is clean — you run the code and the error yells at you. Call a method that doesn't exist and the stack trace shouts, then paste it back into the agent. Done. Willison made this as a general claim, not restricted to a language or domain. It holds on the web. It holds in a Python REPL. It did not hold in my firmware. &lt;strong&gt;The moment the assumption "run it and the error surfaces" collapses, the entire run-and-detect feedback loop loses its meaning.&lt;/strong&gt; The compile passed, the build passed, the binary flashed, and the board quietly turned into a brick. Willison's optimism did not protect me there.&lt;/p&gt;

&lt;p&gt;I did not want to file this under "Claude Code isn't smart enough yet." I tried the same prompt against GPT-5 and Gemini and got similar results. The problem was not &lt;strong&gt;AI quality&lt;/strong&gt; but &lt;strong&gt;where I had placed the AI in my process&lt;/strong&gt;. I was expecting "verified output" from a generation stage. Generation is the stage where hallucination is natural; verification has to happen somewhere else. The empty seat was not the AI's to fill — it was mine.&lt;/p&gt;

&lt;p&gt;Through my time using AI coding agents, I translated that lesson from code into the &lt;strong&gt;shape of a pipeline&lt;/strong&gt;. Pieces I'd built at different moments — a &lt;a href="https://reversetobuild.com/claude-code-embedded-firmware-development/" rel="noopener noreferrer"&gt;Kconfig verification hook&lt;/a&gt;, a &lt;a href="https://reversetobuild.com/ai-firmware-development-workflow/" rel="noopener noreferrer"&gt;gate-based workflow&lt;/a&gt;, an &lt;a href="https://reversetobuild.com/firmware-hil-ci-pipeline/" rel="noopener noreferrer"&gt;HIL CI feedback loop&lt;/a&gt; — only in hindsight did I see they were all answering the same question: &lt;em&gt;at which stage, with what verification, do I hand work to the AI?&lt;/em&gt; This essay is my current answer. Six skills, six gates, and the failures and trade-offs I hit at each seat.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Six Skills as a Frame — Gates, Not a Loop
&lt;/h2&gt;

&lt;p&gt;Many AI coding guides talk about a "loop" — research, plan, implement, review, back to research. Circles are pretty but they did not match my experience. Circles have &lt;strong&gt;nothing to pass through&lt;/strong&gt;. I started seeing the process as &lt;strong&gt;gates&lt;/strong&gt; instead. Each stage has a pass condition — "don't verify this and the next stage gets poisoned" — and the mechanism that holds that condition is its own thing.&lt;/p&gt;

&lt;p&gt;Here is the shape of my pipeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhsjqdyle3w31jhu0r7yq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhsjqdyle3w31jhu0r7yq.png" alt="Diagram of an AI coding pipeline connecting six skill gates from Research to Review, with Fact-Check inserted twice — after Research and after Plan" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Research → Fact-Check → Plan → Fact-Check → Implement → Debug → Review. Why Fact-Check sits twice is explained in Section 2&lt;/p&gt;

&lt;p&gt;The odd part here is that &lt;strong&gt;Fact-Check appears twice&lt;/strong&gt; — once right after Research, once right after Plan. I also thought "once is enough" at first. Then I ran into a pattern several times: the research was collected cleanly, but &lt;strong&gt;the assumptions the AI added during planning&lt;/strong&gt; were wrong. Implicit premises like "this library supports that platform" or "this API already exists in v2.4." These were not facts from research but &lt;strong&gt;new claims the planner introduced&lt;/strong&gt;, and they needed a separate teardown.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://addyosmani.com/blog/ai-coding-workflow/" rel="noopener noreferrer"&gt;Addy Osmani's 2026 workflow&lt;/a&gt; has five stages. &lt;a href="https://code.claude.com/docs/en/best-practices" rel="noopener noreferrer"&gt;The Claude Code docs&lt;/a&gt; use four: Explore → Plan → Code. &lt;a href="https://cursor.com/blog/agent-best-practices" rel="noopener noreferrer"&gt;Cursor's best practices&lt;/a&gt; — interestingly — barely use the word "hallucination" in the body and instead say "AI-generated code can look right while being subtly wrong." All three see the same phenomenon. The difference is the number of gates, and &lt;strong&gt;the number of gates scales with the feedback latency of the domain&lt;/strong&gt;. Web and scripting get run-to-error feedback in seconds, so two or three gates are enough. Firmware sits with tens of minutes between compile and boot, and weeks between boot and "no intermittent bug." You need more gates — and not just more, but &lt;strong&gt;different kinds&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That is why I call them gates rather than loops. A loop is a question of how many times you go around; a gate is a question of &lt;strong&gt;where you put what&lt;/strong&gt;. The latter is system design, the former is operational feel. This essay sits on the system-design side.&lt;/p&gt;

&lt;p&gt;What the AI is good at and bad at also shifts per stage. At Research the AI is an excellent "keyword expander" and a terrible fact checker. At Debug it flips — the AI is an excellent log reader, and here I actually get better results by stepping back. Splitting the work into six skills is how I avoid losing that &lt;strong&gt;role inversion&lt;/strong&gt;. Lump them together and everything collapses into "the AI just isn't great."&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Skill 1 — Research: Excellent Assistant, Terrible Fact Checker
&lt;/h2&gt;

&lt;p&gt;At the research stage the AI is especially good at three things. Keyword expansion (say "BLE 5.3 periodic advertising" and fifteen adjacent terms come out). Comparison tables (current draw, RX sensitivity, BOM cost across chip A/B/C). Document summarization (the two paragraphs I want from a 60-page datasheet). Use those three well and you research two to three times faster than alone.&lt;/p&gt;

&lt;p&gt;The trouble starts right after. The AI &lt;strong&gt;cannot judge source credibility&lt;/strong&gt;, &lt;strong&gt;cannot guarantee recency&lt;/strong&gt;, and &lt;strong&gt;cannot verify domain-specific accuracy&lt;/strong&gt;. I once trusted an AI summary over the datasheet and reversed a register bit order — a bit that had flipped between chip revisions A and B. The AI confidently served the revision-A answer. Half a day gone to debugging. The problem was not that the summary was wrong; the problem was that I had not built a way to check &lt;strong&gt;whether the summary was wrong&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So my Research stage now carries three hard-coded rules. First, &lt;strong&gt;source tagging&lt;/strong&gt;. Every entry in the findings file gets labeled as &lt;code&gt;[official]&lt;/code&gt;, &lt;code&gt;[community]&lt;/code&gt;, or &lt;code&gt;[AI inference]&lt;/code&gt;. That one-word tag decides the "what to doubt first" order at the Fact-Check stage. Second, &lt;strong&gt;concrete query design&lt;/strong&gt;. "Find me BLE OTA docs" is a bad prompt; "official docs, release notes, and &lt;code&gt;ncs-*&lt;/code&gt; tag commit messages for Nordic nRF Connect SDK 2.5's MCUboot swap algorithm" is a good one. The latter forces the AI to &lt;strong&gt;choose where to look&lt;/strong&gt;. Third, &lt;strong&gt;persistence to &lt;code&gt;.md&lt;/code&gt;&lt;/strong&gt;. Research output always accumulates in one &lt;code&gt;findings.md&lt;/code&gt;. Sessions can drop, context can compact — the information survives and flows cleanly into the next stage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you freeze it as a file&lt;/strong&gt; — if you decide to embed the Research stage as a skill, pin four things into the skill text. ① An interface that takes "request (topic, scope)" as an argument. ② An output that &lt;strong&gt;appends&lt;/strong&gt; to a single markdown file rather than overwriting. ③ A directive at the top: "analyze deeply and record the details thoroughly" (that single sentence roughly doubles or triples the perceived summary depth). ④ The rule that matters most — &lt;strong&gt;do not modify any file other than the one this skill writes&lt;/strong&gt;. Miss the fourth and the day comes when the AI says "while I was at it, I also fixed &lt;code&gt;main.c&lt;/code&gt;." Unverified edits slip into the research stage, and Fact-Check ends up breaking already-polluted input. Practically, scope tool permissions to a single write path: &lt;code&gt;allowed-tools: Read, Grep, Glob, WebFetch, Write(findings.md)&lt;/code&gt;. The Claude Code skills docs recommend exactly this shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Skill 2 — Fact-Check: Break the Plan Before You Ship It
&lt;/h2&gt;

&lt;p&gt;Fact-Check will be the strangest-sounding section here. Most guides do not place a dedicated verification stage between research and implementation. I place &lt;strong&gt;two&lt;/strong&gt;. One right after research, one right after planning. To explain why, I need to start with a pattern I kept hitting.&lt;/p&gt;

&lt;p&gt;If you have ever told an AI, inside the same session, "find what's wrong with what you just researched," you know how subtly disappointing the result is. The AI leans toward &lt;strong&gt;confirming its own answer&lt;/strong&gt;. It will fix typos and small wording, but the big claims — "this chip supports that feature" — usually survive. I first blamed the model. Then I saw Anthropic's automated red teaming work from 2024 and changed my mind. One model generates attacks and &lt;strong&gt;a different model defends&lt;/strong&gt;. The match does not exist inside a single model, a single session. The industry had already converged on "only verification in an independent session counts." Addy Osmani calls this "secondary AI sessions to critique primary outputs." The Claude Code docs recommend a Writer/Reviewer pattern and explain it in one line: "A fresh context improves code review since Claude won't be biased toward code it just wrote." That bias is exactly the subtle disappointment I kept feeling.&lt;/p&gt;

&lt;p&gt;So my Fact-Check skill does four things.&lt;/p&gt;

&lt;p&gt;First, &lt;strong&gt;it forces an independent session&lt;/strong&gt;. The researching session and the fact-checking session do not share context. The skill takes only a file path as input and reads from there as if seeing the document for the first time. I deliberately build the setup of handing a paper to someone who does not know the answer.&lt;/p&gt;

&lt;p&gt;Second, &lt;strong&gt;it hunts "things that don't exist" deterministically&lt;/strong&gt;. In my experience this is the most dangerous family of hallucinations. The principle behind &lt;a href="https://reversetobuild.com/claude-code-embedded-firmware-development/" rel="noopener noreferrer"&gt;the Kconfig-symbol verification hook&lt;/a&gt; is simple — extract every symbol mentioned in the research or plan and &lt;code&gt;grep&lt;/code&gt; the actual Kconfig tree to confirm each one. Present → pass; absent → stamp &lt;code&gt;[TBD: needs fact-check]&lt;/code&gt; and report. What matters is that it is a deterministic file-existence check, not a probabilistic AI judgment. &lt;strong&gt;You wrap nondeterministic generation in deterministic verification&lt;/strong&gt; — that is exactly what the word "gate" means here. Recently an academic version of the same idea appeared. &lt;a href="https://arxiv.org/abs/2509.09970" rel="noopener noreferrer"&gt;arXiv 2509.09970&lt;/a&gt; validates GPT-4-generated FreeRTOS firmware in QEMU, categorizes faults into buffer overflow (CWE-120), race condition (CWE-362), and DoS (CWE-400), runs fuzzing, static analysis, and runtime checks through a three-stage agent loop, and reports a &lt;strong&gt;92.4% Vulnerability Remediation Rate and a 37.3% improvement margin&lt;/strong&gt;. The numbers are tied to that paper's sample, but the design principle — close generation's nondeterminism with verification's determinism — is the same as my hook's.&lt;/p&gt;

&lt;p&gt;Third, &lt;strong&gt;it embeds a Red Team prompt&lt;/strong&gt;. The second Fact-Check, right after planning, centers on logical weaknesses rather than cross-referencing official docs. I pin a single line into the skill:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a senior engineer. Find the three weakest links in the plan below,
and for each one describe a concrete failure scenario and the moment it fails.
"It'll probably be fine" counts as one of the three failures.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The last line matters more than it looks. Without pinning the Red Team role, the AI wraps up with "a mostly solid plan."&lt;/p&gt;

&lt;p&gt;Fourth, &lt;strong&gt;it never modifies the source&lt;/strong&gt;. Fact-Check output does not touch &lt;code&gt;findings.md&lt;/code&gt; or the plan file; it writes to a &lt;strong&gt;separate report file&lt;/strong&gt;. The moment a verifier edits the verification target, the verifier becomes a new source of contamination. This has to be enforced by structure, not by discipline — in Claude Code, scope &lt;code&gt;allowed-tools&lt;/code&gt; to &lt;code&gt;Read, Grep, WebFetch, Write(fact-check-report.md)&lt;/code&gt;. Write permission opens for the report file only.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you freeze it as a file&lt;/strong&gt; — input is the path of the document to verify, output is a single report file. The four items to pin are exactly the four paragraphs above. One addition: Fact-Check is a skill that &lt;strong&gt;should have no side effects&lt;/strong&gt;, so setting &lt;code&gt;disable-model-invocation: true&lt;/code&gt; and only running it on explicit invocation is the safer default. The Claude Code skill system exposes that flag for exactly this use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Skill 3 — Plan: Draft It Twice, Break It Once
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://code.claude.com/docs/en/best-practices" rel="noopener noreferrer"&gt;Plan Mode&lt;/a&gt; is not universal. The Claude Code docs admit this plainly: "If you could describe the diff in one sentence, skip the plan." Turning on Plan Mode for a one-sentence diff costs more than it returns. My Plan stage does &lt;strong&gt;not&lt;/strong&gt; run every time — it runs when I feel "my mental model and the AI's mental model might be misaligned before I touch code." The heuristic, from experience, is roughly: if more than two files are affected, or if I am touching a system I know less well, I always run Plan.&lt;/p&gt;

&lt;p&gt;When Plan runs, I draft twice. The first draft is the AI's; the second is the AI revising based on my annotations. At least two rounds. This is closest to Osmani's "waterfall in 15 minutes" analogy, and what matters is that the two rounds serve different goals. Round one &lt;strong&gt;enumerates the full list of steps&lt;/strong&gt;. Round two &lt;strong&gt;attaches the trade-offs round one missed&lt;/strong&gt;. It is not doing the same thing twice.&lt;/p&gt;

&lt;p&gt;Round two carries one extra prompt — "where will this plan fail first?" It is the same family as the Red Team in Fact-Check #2, but the timing differs. Throwing one self-destructive question before the plan hardens turns the answer into a &lt;strong&gt;"risks" section&lt;/strong&gt; that then acts as a warning light throughout implementation. Remember that the reader of the plan is future-you, and its usefulness goes up.&lt;/p&gt;

&lt;p&gt;I pin four required elements to every plan. ① &lt;strong&gt;Approach detail&lt;/strong&gt; — why this order, why not another. ② &lt;strong&gt;Before/after code snippets&lt;/strong&gt; — not "refactor this part," but the actual shape of the change. ③ &lt;strong&gt;Exact file paths&lt;/strong&gt; — no "modify the relevant files." ④ &lt;strong&gt;Explicit trade-offs&lt;/strong&gt; — chosen approach, alternatives, reasons for rejection. Miss these four and the plan does not reach the level of "Claude can implement this right now." Most of the mid-implementation "what should I do here?" interruptions come from vague plans.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you freeze it as a file&lt;/strong&gt; — input is the path of a research file (or &lt;code&gt;findings.md&lt;/code&gt;), output is a single plan document with checkboxes (&lt;code&gt;- [ ]&lt;/code&gt;). The five things to pin into the skill: ① read the research file first (without this the AI plans from memory), ② justify the chosen order, ③ before/after code snippets, ④ exact file paths, ⑤ trade-offs. For items four and five, leaving empty slots in the skill's example template builds the habit of filling them in.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Skill 4 — Implement: Structured Context Cuts Hallucinations
&lt;/h2&gt;

&lt;p&gt;The one thing I learned at Implement is the equation &lt;strong&gt;context = answer key&lt;/strong&gt;. The AI's probability of producing the right answer is far more sensitive to &lt;strong&gt;the quality of the context you attach&lt;/strong&gt; than to the "instructions" in your prompt. When I &lt;a href="https://reversetobuild.com/claude-code-embedded-firmware-development/" rel="noopener noreferrer"&gt;had the AI write a Devicetree overlay&lt;/a&gt;, I bundled the board file, the target node's DTS, and the binding YAML together via &lt;code&gt;@&lt;/code&gt; references — and a task that had failed three times passed on the first try. I recently saw the same observation independently at &lt;a href="https://reversetobuild.com/claude-code-embedded-firmware-development/" rel="noopener noreferrer"&gt;reversetobuild.com&lt;/a&gt;. That blog recommends the exact same pattern: inject &lt;code&gt;@boards/arm/nrf52840dk_nrf52840.dts&lt;/code&gt;, &lt;code&gt;@zephyr/dts/arm/nordic/nrf52840.dtsi&lt;/code&gt;, and &lt;code&gt;@zephyr/dts/bindings/spi/spi-device.yaml&lt;/code&gt; together. Reaching the same conclusion independently tells me this is not a personal trick but a &lt;strong&gt;structural requirement of the domain&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Opposite structured context sits &lt;strong&gt;incremental implementation&lt;/strong&gt;. Generating 200 lines at once is almost always worse than generating 20 lines ten times. The reason is simple — every 20 lines the build runs, the type checker runs, the linter runs. Errors surface immediately, and the AI generates the next 20 lines in a &lt;strong&gt;slightly different context&lt;/strong&gt;. Generate 200 at once and a single error pulls the whole block down, and the AI loses context about where to start fixing. "Short and frequent" is not just a commit principle; it is a generation principle.&lt;/p&gt;

&lt;p&gt;And the most practical rule — &lt;strong&gt;no type escapes&lt;/strong&gt;. TypeScript's &lt;code&gt;any&lt;/code&gt; and &lt;code&gt;unknown&lt;/code&gt;, Python's &lt;code&gt;Any&lt;/code&gt;, Go's &lt;code&gt;interface{}&lt;/code&gt;, C/C++'s &lt;code&gt;void*&lt;/code&gt;. These are the first exits the AI takes when stuck. When the AI plasters over a spot with &lt;code&gt;any&lt;/code&gt; "just to make it run," that spot is exactly where the runtime error lands a few weeks later. My Implement skill text bans these explicitly, but &lt;strong&gt;the enforcement is a hook, not a skill&lt;/strong&gt;. A PostToolUse hook called &lt;code&gt;auto-typecheck.sh&lt;/code&gt; runs &lt;code&gt;tsc --noEmit&lt;/code&gt; or &lt;code&gt;mypy&lt;/code&gt; right after a file edit and, on any &lt;code&gt;any&lt;/code&gt; regression or type error, blocks the tool call itself with &lt;code&gt;exit 2&lt;/code&gt;. Skill text is persuasion; the hook is the contract. Do not mix the two.&lt;/p&gt;

&lt;p&gt;Security code is the exception. Cryptography, authentication, signing, key management — I do not hand these to the AI. You might ask "can't you just review it?" I'll come back to that in the Review section. The short version: &lt;strong&gt;the buggy distributions an AI reviewer misses and the security-bug distribution overlap&lt;/strong&gt;. So I remove them from the generation stage entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you freeze it as a file&lt;/strong&gt; — input is a plan file path, output is code plus plan-checkbox updates. Skill items to pin: ① take the plan file as an argument and execute in order, ② mark &lt;code&gt;- [ ]&lt;/code&gt; to &lt;code&gt;- [x]&lt;/code&gt; on each step to reflect real-time progress in the file, ③ &lt;strong&gt;do not stop until all steps are done&lt;/strong&gt; — no mid-check prompts (without this, the AI asks "continue?" every step), ④ ban &lt;code&gt;any&lt;/code&gt;, &lt;code&gt;unknown&lt;/code&gt;, &lt;code&gt;interface{}&lt;/code&gt;, &lt;code&gt;void*&lt;/code&gt;, ⑤ run the language's type checker after every file edit. Note: items ④ and ⑤ are &lt;strong&gt;instructed&lt;/strong&gt; in the skill text, but the enforcement lives in the hook. Section 9 covers this separation directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Skill 5 — Debug: What the AI Is Best at Is Reading Logs
&lt;/h2&gt;

&lt;p&gt;Here I flip the tone. The last five sections weighted toward "how the AI gets things wrong." Debug is &lt;strong&gt;the stage where I get the most out of the AI&lt;/strong&gt;. Logs are &lt;strong&gt;fact data&lt;/strong&gt;. Compiler error messages, runtime stack traces, panic dumps over serial. These are inputs the AI cannot fabricate — more precisely, inputs it has no need to fabricate — and the room for hallucination shrinks dramatically. At Debug, the share of AI suggestions I accept is higher than at any other stage.&lt;/p&gt;

&lt;p&gt;Three tips are enough. First, &lt;strong&gt;pass the error message with the related source&lt;/strong&gt;. Hand over only the error and the AI guesses. Include the 30 lines around the error line, the definitions of functions on the call stack, and the relevant headers, and the guesswork turns into analysis. Second, &lt;strong&gt;have the AI write a minimal reproduction&lt;/strong&gt;. Tell it "write the smallest program that reproduces this bug" and in the process the AI has to make the bug's assumptions explicit. That explicitness often surfaces the root cause. Third, &lt;strong&gt;structured log formatting&lt;/strong&gt;. Emit serial logs as JSON or at least with consistent tags (&lt;code&gt;[BLE]&lt;/code&gt;, &lt;code&gt;[OTA]&lt;/code&gt;, &lt;code&gt;[MCUBOOT]&lt;/code&gt;) and the AI's pattern matching gets much stronger. The reason I enforced tag formats in &lt;a href="https://reversetobuild.com/firmware-hil-ci-pipeline/" rel="noopener noreferrer"&gt;the HIL CI story&lt;/a&gt; was not for the human reader — it was to make the logs easier for the AI to read.&lt;/p&gt;

&lt;p&gt;Here this essay hits its paradox. &lt;strong&gt;Do not freeze Debug itself as a skill.&lt;/strong&gt; I embedded Research, Plan, Fact-Check, Implement, and Review as skill files but deliberately left Debug out. One reason — Debug is inherently &lt;strong&gt;reactive&lt;/strong&gt;. Every incident has a different error class, different related files, different reproduction conditions. Packing that into a single &lt;code&gt;SKILL.md&lt;/code&gt; kills flexibility. The AI ends up following a "generalized debug procedure" and misses the specific oddity of this incident. Prompt patterns are better cooked on the spot, per situation.&lt;/p&gt;

&lt;p&gt;I did pull &lt;strong&gt;input-bundle normalization&lt;/strong&gt; into its own skill. I call it &lt;code&gt;error-bundle&lt;/code&gt;, and it does exactly one thing — packs the error log, the related source files, and the reproduction conditions into a fixed shape and attaches them to the AI's context. The core work (hypothesis, root-cause tracing) stays in ad-hoc prompts; only the repetitive input prep is skillified.&lt;/p&gt;

&lt;p&gt;This boundary surfaces a principle running through this entire essay — &lt;strong&gt;reactive work and productive work have different skillification returns&lt;/strong&gt;. Productive work (Research, Plan, Implement, Review, Fact-Check) has fixed I/O, so freezing pays. Reactive work (Debug) varies per incident, so freezing the core hurts. Debug is the clearest illustration of that boundary. Skillifying is not always the answer; it is the answer only when there is a repeating &lt;strong&gt;shape&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Skill 6 — Review: Have an AI Review AI Code — Don't Trust It
&lt;/h2&gt;

&lt;p&gt;In an &lt;a href="https://reversetobuild.com/firmware-hil-ci-pipeline/" rel="noopener noreferrer"&gt;HIL CI pipeline&lt;/a&gt; I built a double loop where one AI session reviews code written by another. The bugs that loop has caught fall into three types — missing edge cases, style inconsistencies, minor type weaknesses. The bugs it misses all cluster into one category — &lt;strong&gt;"code that runs but is bad."&lt;/strong&gt; Architectural wrong turns, performance bottlenecks, race conditions, security holes. As Cursor put it in one line: "AI-generated code can look right while being subtly wrong." Reviewer AIs share the same weakness, so the subtle wrongness the author missed is the same subtle wrongness the reviewer misses.&lt;/p&gt;

&lt;p&gt;So my Review skill is designed around &lt;strong&gt;surfacing the areas the AI cannot catch&lt;/strong&gt;. Three rules.&lt;/p&gt;

&lt;p&gt;First, &lt;strong&gt;pin a Staff Engineer persona&lt;/strong&gt;. Personas are the oldest prompt trick in the book, but rewriting them on every call is wasteful. Put it at the top of the skill once and every review gets the "senior lens" automatically. My current persona reads: "You are a 10-year Staff Engineer. Sort production failure scenarios for this code by cost. Address style last." That last line matters — without it the AI starts with the easy wins and runs out of steam by the time it reaches real structural problems.&lt;/p&gt;

&lt;p&gt;Second, &lt;strong&gt;force an independent session&lt;/strong&gt;. Same principle as Fact-Check, and it matters even more here. When the author's session reviews its own code, confirmation bias doubles — the AI remembers the logic it just wrote and verifies only inside that logic. The Claude Code docs capture it in one line: "A fresh context improves code review since Claude won't be biased toward code it just wrote." Implementation-wise, split it into a subagent and grant only &lt;code&gt;Read, Grep, Glob&lt;/code&gt;. A reviewer that cannot edit code is the only real reviewer.&lt;/p&gt;

&lt;p&gt;Third, &lt;strong&gt;auto-flag security paths for manual review&lt;/strong&gt;. When the reviewer generates the report, if any modified file path touches &lt;code&gt;auth/&lt;/code&gt;, &lt;code&gt;crypto/&lt;/code&gt;, &lt;code&gt;sign/&lt;/code&gt;, or &lt;code&gt;token/&lt;/code&gt;, the report inserts a top banner: "⚠ SECURITY PATH — AI review is not sufficient." That banner requires a human to read and remove it manually before the pipeline moves on. My time using these agents confirmed that the security-bug distribution does not overlap with the bugs AI is good at, and that confirmation only becomes permanent when I pin it into the skill file. Relying on memory to be careful manually every time fails eventually.&lt;/p&gt;

&lt;p&gt;The rule that reviewers do not modify the source is identical to Fact-Check. The review report is a separate file, and only that file is writable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you freeze it as a file&lt;/strong&gt; — input is a diff or list of file paths, output is a review report. Four items to pin: ① Staff Engineer persona at the top of the skill, ② force an independent session (&lt;code&gt;allowed-tools: Read, Grep, Glob&lt;/code&gt; + subagent), ③ auto-flag for security paths, ④ no source modification (write permission limited to the report file). Along with Fact-Check, this is the skill with &lt;strong&gt;the highest file-freezing return&lt;/strong&gt; — its I/O shape is fixed, so reuse pays immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Three Traps You Hit When Building Many Skills
&lt;/h2&gt;

&lt;p&gt;I have designed the six skills above and covered each one's individual requirements in the "If you freeze it as a file" blocks. But as you stack skills one after another, problems emerge that belong not to any single skill but to the &lt;strong&gt;entire skill repository&lt;/strong&gt;. These three are cross-cutting warnings that do not fit inside any one section above. I stepped on each of them once while using AI coding agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Naming collisions.&lt;/strong&gt; Generic names like &lt;code&gt;planner&lt;/code&gt;, &lt;code&gt;research&lt;/code&gt;, &lt;code&gt;review&lt;/code&gt; collide the moment a second pipeline exists. My repo currently has four planners — &lt;code&gt;planner&lt;/code&gt; (essay planning), &lt;code&gt;reddit-post-planner&lt;/code&gt;, &lt;code&gt;x-thread-planner&lt;/code&gt;, and &lt;code&gt;impl-planner&lt;/code&gt; (code implementation planning). It started as a single &lt;code&gt;planner&lt;/code&gt;. Then I built a Reddit-post pipeline and named that one &lt;code&gt;planner&lt;/code&gt; too, and two identically named skills started colliding across contexts. Only after I added domain prefixes (&lt;code&gt;impl-&lt;/code&gt;, &lt;code&gt;reddit-post-&lt;/code&gt;, &lt;code&gt;x-thread-&lt;/code&gt;) did the confusion stop. &lt;strong&gt;Prefix by domain from day one.&lt;/strong&gt; &lt;code&gt;engineering-essay-planner&lt;/code&gt; looks excessive when there is only one planner, but the day a second planner appears always comes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Rule-copy debt.&lt;/strong&gt; Copy-pasting rules like "no &lt;code&gt;any&lt;/code&gt;," "persistent type checking," or "do not modify source documents" into multiple &lt;code&gt;SKILL.md&lt;/code&gt; files means the day comes when you fix one copy and forget the rest. I once had "do not modify source" copy-pasted into Research, Fact-Check, and Review, and changing one policy meant editing three files. Global rules belong in &lt;code&gt;.claude/rules/&lt;/code&gt; or &lt;code&gt;CLAUDE.md&lt;/code&gt; &lt;strong&gt;exactly once&lt;/strong&gt;, and each skill references them. The Claude Code docs flag this sharply: "Bloated CLAUDE.md files cause Claude to ignore your actual instructions." Copy-pasted rules grow length, and growth &lt;strong&gt;dilutes the weight of every instruction&lt;/strong&gt;. Cursor's guidance points the same way: "Add rules only when you notice the agent making the same mistake repeatedly." Rules are added when they earn it, and added rules live in one place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Hook alignment.&lt;/strong&gt; The third is the subtlest. If a skill is the tool for telling the AI "what to do," a hook is the tool for enforcing "what not to allow." Blur that boundary and both grow weak. My Implement skill text says "run the type checker after every file edit," but that is &lt;strong&gt;persuasion&lt;/strong&gt;. The actual &lt;strong&gt;enforcement&lt;/strong&gt; lives in the PostToolUse hook &lt;code&gt;auto-typecheck.sh&lt;/code&gt;, which runs &lt;code&gt;tsc --noEmit&lt;/code&gt; whenever a file is edited and, if any error appears, blocks the tool call with &lt;code&gt;exit 2&lt;/code&gt;. There is no way for the AI to bypass that block. The Claude Code docs put it in one line: "Unlike CLAUDE.md instructions which are advisory, hooks are deterministic and guarantee the action happens." &lt;strong&gt;Instructions are advice; hooks are contracts.&lt;/strong&gt; The "do not modify the source" rules in Fact-Check and Review work the same way — instruct it in the skill text, but the actual block comes from a PreToolUse hook like &lt;code&gt;protect-docs.sh&lt;/code&gt;. Try to make skills and hooks carry the same responsibility and one of them will betray you.&lt;/p&gt;

&lt;p&gt;Compressed to one line: &lt;strong&gt;domain-prefixed naming / global rules in one place / delegate determinism to hooks.&lt;/strong&gt; I learned each of these the hard way, one incident each. You just read about them, so maybe you can skip one.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. The Compounding Six Skills Build
&lt;/h2&gt;

&lt;p&gt;Each of the six pays off on its own. Running the Research skill alone speeds up research; running Fact-Check alone filters out at least one wrong premise. But the real return of this structure comes from &lt;strong&gt;the relationships between the skills&lt;/strong&gt;. A clean Fact-Check gives Plan a clean input; a well-written Plan cuts hallucinations at Implement; a tight Implement cuts load at Debug; a fast Debug lets Review concentrate on structural issues. Improve any one stage by 1.2× and that 1.2 multiplies into the next, until the original task feels two to three times faster. I call this &lt;strong&gt;AI compounding&lt;/strong&gt;. It is a return you cannot get from single-prompt improvements.&lt;/p&gt;

&lt;p&gt;The reason every stage needs a differently shaped gate is that every stage has a different failure mode. Research fails on source credibility, Plan fails on implicit premises, Implement fails on missing context, Debug fails on the opposite — context overflow — and Review fails on author bias. They all share the name "gate," but the machines inside are entirely different. One kind of device cannot block five kinds of failure.&lt;/p&gt;

&lt;p&gt;Next quarter I want to try two directions. One is &lt;strong&gt;automatic skill chaining&lt;/strong&gt; — Research finishes, Fact-Check fires automatically, and Plan fires only on pass. Today I type &lt;code&gt;/fact-check&lt;/code&gt; by hand. The other is &lt;strong&gt;expanding hook-based gate automation&lt;/strong&gt; — today only Implement has a hook attached, and I can see room for deterministic check hooks at Fact-Check and Review. If both land, gates will no longer be something &lt;strong&gt;I&lt;/strong&gt; guard — they become something &lt;strong&gt;the system&lt;/strong&gt; guards. When that moment arrives I'll have another reason to write a follow-up.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>hallucination</category>
      <category>zephyr</category>
    </item>
    <item>
      <title>The Build Passed, So Why Doesn't It Run — Automating Firmware Tests on Real Hardware</title>
      <dc:creator>Errata Hunter</dc:creator>
      <pubDate>Wed, 15 Apr 2026 21:29:49 +0000</pubDate>
      <link>https://dev.to/erratahunter/the-build-passed-so-why-doesnt-it-run-automating-firmware-tests-on-real-hardware-2m5i</link>
      <guid>https://dev.to/erratahunter/the-build-passed-so-why-doesnt-it-run-automating-firmware-tests-on-real-hardware-2m5i</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A passing &lt;code&gt;west build&lt;/code&gt; doesn't mean the firmware runs on hardware — catching silent failures manually doesn't scale.&lt;/li&gt;
&lt;li&gt;Combining Zephyr Twister's &lt;code&gt;--device-testing&lt;/code&gt; mode with a self-hosted runner gives you automated serial-log-based testing on real boards with every push.&lt;/li&gt;
&lt;li&gt;All you need to start is a Raspberry Pi (or even a Windows PC) and a J-Link.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;The most deflating moment in firmware development is when &lt;code&gt;west build&lt;/code&gt; passes cleanly, you flash the board, and nothing happens. No serial output, no LED activity — just a Hard Fault dump, or worse, dead silence.&lt;/p&gt;

&lt;p&gt;A successful build means "the code has no syntax errors." It does not mean "the firmware behaves as intended on hardware." Verifying that gap requires plugging in a J-Link, flashing, opening a serial terminal, and reading the logs by hand. Repeat this for every change, and eventually you start cutting corners: "I'll just spot-check this one." Those shortcuts compound, and regression bugs creep in.&lt;/p&gt;

&lt;p&gt;This post documents how I automated that manual verification. On every push, the firmware is automatically flashed to a real board, serial logs are captured, and pass/fail is determined — a HIL (Hardware-in-the-Loop) CI (Continuous Integration) pipeline. I built it with Zephyr's &lt;a href="https://docs.zephyrproject.org/latest/develop/test/twister.html" rel="noopener noreferrer"&gt;Twister&lt;/a&gt; test framework, a self-hosted runner, and a single Raspberry Pi.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is the fifth post in the "AI and Embedded Firmware" series. The previous post introduced a workflow for structurally preventing AI hallucinations, and this post closes the last gap: manual testing. Each post stands on its own — you don't need to read the series in order.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Cost of "Just Flash It and See"
&lt;/h2&gt;

&lt;p&gt;The weakest link in the four-stage loop was the final stage: testing. After the AI wrote the code, I reviewed it, and &lt;code&gt;west build&lt;/code&gt; passed, the next step was entirely manual — plug in a J-Link (SEGGER's debug probe), run &lt;code&gt;west flash&lt;/code&gt;, open a serial terminal, and read the logs.&lt;/p&gt;

&lt;p&gt;Three problems with this manual routine:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, a passing build doesn't guarantee correct behavior.&lt;/strong&gt; AI-generated Zephyr code frequently passed &lt;code&gt;west build&lt;/code&gt; and then silently failed on hardware. Calling &lt;code&gt;k_sleep()&lt;/code&gt; inside a timer callback, attempting heap allocation in an ISR (Interrupt Service Routine) context — the compiler catches none of this. You only see the Hard Fault after flashing, or worse, the board just hangs with zero output. The &lt;a href="https://survey.stackoverflow.co/2025/" rel="noopener noreferrer"&gt;Stack Overflow 2025 Developer Survey&lt;/a&gt; reported that 45% of developers spend more time debugging AI-generated code than writing it themselves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second, manual testing doesn't scale.&lt;/strong&gt; Fixing one feature can break another. In web development, automated test suites catch these regressions. My firmware workflow had no such safety net. Testing every feature manually after every change is impractical, so I'd only verify "the part I just touched." That's exactly how regression bugs get in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third, repetition creates friction.&lt;/strong&gt; Plug in J-Link, wait for flash, open serial monitor, scan for log patterns, record the result. Two to three minutes each time, but across dozens of iterations per day with the AI loop, the cumulative cost adds up. The real damage, though, is the temptation to skip it. And the code you skip testing on is the code that causes problems later.&lt;/p&gt;

&lt;p&gt;The first three stages of the loop — research, planning, execution — were already efficient thanks to AI collaboration. But as long as the last stage was manual, it bottlenecked the entire pipeline. The missing piece was test automation. And in embedded, test automation means putting real hardware in the loop — HIL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wait — Can't I Just Test on the Board Locally?
&lt;/h3&gt;

&lt;p&gt;"I already have a board on my desk and I'm running &lt;code&gt;west flash&lt;/code&gt; — why bother with CI?" I thought the same thing at first.&lt;/p&gt;

&lt;p&gt;Local testing and HIL CI perform the same physical actions (flash, check serial logs), but the implications differ:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Local Testing (My Desk)&lt;/th&gt;
&lt;th&gt;HIL CI (Automated)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;When it runs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;When I remember to do it&lt;/td&gt;
&lt;td&gt;Automatically on every push&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scope&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tends to verify "just the part I changed"&lt;/td&gt;
&lt;td&gt;Runs the entire defined test suite every time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Environment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Depends on my PC's current state&lt;/td&gt;
&lt;td&gt;Pinned via Docker / fixed SDK version&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Records&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stays in my head&lt;/td&gt;
&lt;td&gt;Persisted in CI logs, visible to the team&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Regression prevention&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Previous features are probably fine"&lt;/td&gt;
&lt;td&gt;"Previous features still pass" — verified automatically&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;It's the same difference as running &lt;code&gt;npm test&lt;/code&gt; locally versus having GitHub Actions run it on every PR. Local testing is a snapshot of "what I verified right now." CI testing is a gate that "every change must pass before merge."&lt;/p&gt;

&lt;p&gt;This difference matters especially for firmware because regressions take far longer to surface. A web service shows error rate spikes on a monitoring dashboard immediately after deploy. Firmware can silently malfunction — intermittent BLE disconnects, failing to wake from sleep under specific conditions. You may not find out until a customer reports it. CI verifying basic behavior on every commit means that at minimum, you can &lt;code&gt;git bisect&lt;/code&gt; to find "when exactly did it break."&lt;/p&gt;




&lt;h2&gt;
  
  
  Embedded CI Is Not Web CI
&lt;/h2&gt;

&lt;p&gt;Web backend CI is comparatively straightforward. Push code, a cloud VM spins up, installs dependencies, runs tests, reports results. The VM starts clean every time, so environment-related flaky tests are relatively rare.&lt;/p&gt;

&lt;p&gt;Embedded CI is fundamentally different.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Limits of QEMU
&lt;/h3&gt;

&lt;p&gt;Zephyr has a built-in test runner called &lt;a href="https://docs.zephyrproject.org/latest/develop/test/twister.html" rel="noopener noreferrer"&gt;Twister&lt;/a&gt;, and Twister can run tests on &lt;a href="https://www.qemu.org/" rel="noopener noreferrer"&gt;QEMU&lt;/a&gt; (an open-source hardware emulator). Testing without a physical board, straight from a CI server — that's appealing. The Zephyr project itself runs thousands of QEMU-based Twister tests.&lt;/p&gt;

&lt;p&gt;But QEMU's coverage has hard limits:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Verifiable with QEMU&lt;/th&gt;
&lt;th&gt;Not Verifiable with QEMU&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Kernel scheduling, mutexes, semaphores&lt;/td&gt;
&lt;td&gt;GPIO, SPI, I2C driver behavior&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory allocation/deallocation logic&lt;/td&gt;
&lt;td&gt;BLE stack (connection, pairing, data transfer)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data structures, protocol parsing&lt;/td&gt;
&lt;td&gt;DMA (Direct Memory Access) transfers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State machine transitions&lt;/td&gt;
&lt;td&gt;Interrupt timing, priority inversion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pure algorithm tests&lt;/td&gt;
&lt;td&gt;Power management (sleep, wake)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;QEMU's driver model doesn't cover every edge case — certain behaviors are considered unnecessary in an emulated environment. The core functionality of most product firmware sits in the right column. The reality is: "Most firmware is too tightly coupled with hardware for emulation to be the only path forward — at some point, the dev board is the only way to make progress."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://renode.io/" rel="noopener noreferrer"&gt;Renode&lt;/a&gt; is an alternative emulator with richer peripheral emulation. Memfault's Interrupt blog covered a &lt;a href="https://interrupt.memfault.com/blog/test-automation-renode" rel="noopener noreferrer"&gt;test automation case combining GitHub Actions and Renode&lt;/a&gt;. But no matter how advanced emulators get, reproducing BLE RF paths or real sensor analog characteristics remains fundamentally difficult.&lt;/p&gt;

&lt;h3&gt;
  
  
  Variables That Only Physical Hardware Creates
&lt;/h3&gt;

&lt;p&gt;Real-board testing introduces variables that don't exist in emulation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Timing:&lt;/strong&gt; Virtual time in an emulator and physical time on real hardware flow differently. A 100ms timeout can pass in QEMU and fail on the board.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Power:&lt;/strong&gt; Unstable USB hub power can reset the board or interrupt flashing mid-process. The CI log just says "connection lost."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RF environment:&lt;/strong&gt; BLE tests are affected by ambient Wi-Fi interference. The same code can pass at the office and fail in the server room.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These variables create flaky tests. In web CI, flaky tests are mostly async timing issues fixable by code changes. In embedded CI, flaky tests are often caused by the physical environment — no amount of code changes will eliminate them.&lt;/p&gt;

&lt;p&gt;That's the reality. Embedded CI is not a world where "correct code guarantees passing tests." But it's still better than manual testing. "Imperfect but automated verification" is more reliable in practice than "thorough but human-dependent verification." I decided to build a HIL CI pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pipeline Design — How Far to Automate
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Self-hosted Runner: The Common Pattern for Connecting Physical Boards to CI
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.github.com/en/actions" rel="noopener noreferrer"&gt;GitHub Actions&lt;/a&gt;, &lt;a href="https://docs.gitlab.com/ci/" rel="noopener noreferrer"&gt;GitLab CI/CD&lt;/a&gt;, &lt;a href="https://support.atlassian.com/bitbucket-cloud/docs/get-started-with-bitbucket-pipelines/" rel="noopener noreferrer"&gt;Bitbucket Pipelines&lt;/a&gt; — all default to cloud VM runners. You can't plug an nRF52 DK into a cloud VM via USB, so all three platforms support &lt;strong&gt;self-hosted runners&lt;/strong&gt;: installing a CI agent on your own physical machine.&lt;/p&gt;

&lt;p&gt;The architecture is the same regardless of platform:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz1zplnf2rlnj8qw97lb2.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz1zplnf2rlnj8qw97lb2.webp" alt="HIL CI architecture — Git push triggers cloud workflow, self-hosted runner flashes firmware via USB to nRF52 DK" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Self-hosted runner bridges the cloud CI platform and the physical board&lt;/p&gt;

&lt;p&gt;I chose a Raspberry Pi 4 as the runner. The reason is simple: low power consumption for 24/7 operation, four USB ports for connecting multiple boards, and ARM Linux where the Zephyr toolchain runs natively. [TBD: Need to add actual Raspberry Pi performance/stability experience after use]&lt;/p&gt;

&lt;h3&gt;
  
  
  You Don't Need a Raspberry Pi
&lt;/h3&gt;

&lt;p&gt;"Do I have to buy a Raspberry Pi?" No. A self-hosted runner is any machine that can run the CI agent software. A Linux desktop, a macOS laptop, even a Windows PC works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using a Windows PC as a runner:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners" rel="noopener noreferrer"&gt;GitHub Actions&lt;/a&gt;, &lt;a href="https://docs.gitlab.com/runner/install/windows/" rel="noopener noreferrer"&gt;GitLab CI&lt;/a&gt;, and &lt;a href="https://support.atlassian.com/bitbucket-cloud/docs/runners/" rel="noopener noreferrer"&gt;Bitbucket Pipelines&lt;/a&gt; all officially support Windows runner agents. The GitHub Actions runner is best installed in a drive root folder like &lt;code&gt;C:\actions-runner&lt;/code&gt; (to avoid Windows path length limits), and GitLab Runner provides an &lt;code&gt;.exe&lt;/code&gt; installer.&lt;/p&gt;

&lt;p&gt;Build and flash tools also run on Windows. &lt;code&gt;west build&lt;/code&gt;, &lt;code&gt;west flash&lt;/code&gt;, and &lt;a href="https://docs.nordicsemi.com/bundle/ug_nrf_cltools/page/UG/cltools/nrf_nrfjprogexe_reference.html" rel="noopener noreferrer"&gt;&lt;code&gt;nrfjprog&lt;/code&gt;&lt;/a&gt; all officially support Windows. Install &lt;a href="https://www.nordicsemi.com/Products/Development-tools/nRF-Command-Line-Tools/Download" rel="noopener noreferrer"&gt;nRF Command Line Tools&lt;/a&gt;, and &lt;code&gt;nrfjprog&lt;/code&gt; is on your PATH. With J-Link drivers installed, you can flash to a USB-connected board immediately. Git for Windows includes Git Bash, so most shell commands in CI YAML &lt;code&gt;run:&lt;/code&gt; blocks execute as-is.&lt;/p&gt;

&lt;p&gt;The trade-offs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;Raspberry Pi&lt;/th&gt;
&lt;th&gt;Windows PC&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;24/7 operation&lt;/td&gt;
&lt;td&gt;5W power draw, no issue&lt;/td&gt;
&lt;td&gt;Keeping a PC always on is impractical; sleep mode kills the runner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Docker support&lt;/td&gt;
&lt;td&gt;Native Linux, works out of the box&lt;/td&gt;
&lt;td&gt;Requires Docker Desktop or WSL2. nrf-docker is an amd64 Linux image, so WSL2 backend is mandatory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;USB stability&lt;/td&gt;
&lt;td&gt;Dedicated device, minimal interference&lt;/td&gt;
&lt;td&gt;Potential port contention with other USB devices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Upfront cost&lt;/td&gt;
&lt;td&gt;~$100 (Pi + board)&lt;/td&gt;
&lt;td&gt;$0 if using an existing PC&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I prefer a dedicated runner machine, which is why I chose the Pi. But if you're just getting started, installing the runner on an existing Windows PC and plugging the board in via USB is the lowest-friction entry point. You can split it off to a Pi later once CI is stable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Platform Comparison
&lt;/h3&gt;

&lt;p&gt;Runner registration differs across the three platforms, but the end result — "run CI jobs on a local machine with access to connected hardware" — is identical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub Actions:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/hil-test.yml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIL Test&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;flash-and-test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;self-hosted&lt;/span&gt;  &lt;span class="c1"&gt;# runs on self-hosted runner&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build firmware&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;west build -b nrf52dk/nrf52832&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Flash and test&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;west twister --device-testing --hardware-map hardware-map.yml -T tests/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;GitLab CI/CD:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .gitlab-ci.yml&lt;/span&gt;
&lt;span class="na"&gt;hil-test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;nrf52dk&lt;/span&gt;  &lt;span class="c1"&gt;# only runs on runners tagged with this label&lt;/span&gt;
  &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;west build -b nrf52dk/nrf52832&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;west twister --device-testing --hardware-map hardware-map.yml -T tests/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Bitbucket Pipelines:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# bitbucket-pipelines.yml&lt;/span&gt;
&lt;span class="na"&gt;pipelines&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;step&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIL Test&lt;/span&gt;
        &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;self.hosted&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;linux&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;nrf52dk&lt;/span&gt;  &lt;span class="c1"&gt;# custom label&lt;/span&gt;
        &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;west build -b nrf52dk/nrf52832&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;west twister --device-testing --hardware-map hardware-map.yml -T tests/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key difference is runner selection syntax. GitHub uses &lt;code&gt;runs-on: self-hosted&lt;/code&gt;, GitLab uses &lt;code&gt;tags:&lt;/code&gt;, Bitbucket uses &lt;code&gt;runs-on:&lt;/code&gt; with a label array. The build and test commands are identical.&lt;/p&gt;

&lt;p&gt;I found GitLab's tag system most natural for embedded. Tag runners with &lt;code&gt;nrf52dk&lt;/code&gt;, &lt;code&gt;esp32&lt;/code&gt;, &lt;code&gt;stm32f4&lt;/code&gt;, and tests automatically route to the matching hardware. I'd heard that one reason the embedded/semiconductor industry favors GitLab Self-managed instances is this flexible runner tag system — after trying it myself, I can see why.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Happens on a Single Push — Step by Step
&lt;/h3&gt;

&lt;p&gt;The YAML reads as "build and test," but behind the scenes, three actors — the CI platform (cloud), the self-hosted runner (local machine), and the dev board (USB-connected) — interact through multiple sequential stages. Here's what happens at each step and where logs are generated.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F19j2ako46uvi0kz5lbru.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F19j2ako46uvi0kz5lbru.webp" alt="HIL CI sequence diagram — step-by-step interaction between CI platform, self-hosted runner, and development board during a single push event" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The complete sequence from a single git push through build, flash, test, and verdict&lt;/p&gt;

&lt;p&gt;Breaking it down:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Steps 1-3: Cloud.&lt;/strong&gt; The developer pushes code. The CI platform reads the YAML, finds a matching runner, and dispatches the job. At this point, the code only exists in the cloud.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Steps 4-5: Runner build.&lt;/strong&gt; The runner checks out the source and cross-compiles with &lt;code&gt;west build&lt;/code&gt;. Build logs are generated here. If the build fails, it stops and the error log is uploaded to the cloud. In the split Docker architecture, this step runs on a cloud runner (amd64).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Steps 6-8: Physical interaction with the board.&lt;/strong&gt; On a successful build, the runner uses &lt;code&gt;nrfjprog&lt;/code&gt; to flash the firmware via USB/J-Link. The board resets, boots the new firmware, and outputs logs through the UART serial port. &lt;strong&gt;This log capture is the core of HIL&lt;/strong&gt; — the runner opens the board's serial port (&lt;code&gt;/dev/ttyACM0&lt;/code&gt; or &lt;code&gt;COM3&lt;/code&gt; on Windows) and reads the output in real time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 9: Verdict.&lt;/strong&gt; Twister matches the captured serial log against regex patterns defined in &lt;code&gt;testcase.yaml&lt;/code&gt;. If "Feature initialized successfully" appears within the timeout, it's a pass. Otherwise, fail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Steps 10-11: Reporting.&lt;/strong&gt; The runner uploads the verdict and log files to the cloud. The CI platform marks the PR with a check (pass or fail). On failure, serial logs are attached as artifacts for the developer to download and analyze.&lt;/p&gt;

&lt;p&gt;Where logs are generated:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Log Type&lt;/th&gt;
&lt;th&gt;Generated At&lt;/th&gt;
&lt;th&gt;Contents&lt;/th&gt;
&lt;th&gt;What to Check on Failure&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Build log&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runner (steps 4-5)&lt;/td&gt;
&lt;td&gt;Compile warnings/errors, linker errors&lt;/td&gt;
&lt;td&gt;Missing headers, Kconfig symbol errors, memory overflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Flash log&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runner → Board (step 6)&lt;/td&gt;
&lt;td&gt;nrfjprog output, J-Link connection status&lt;/td&gt;
&lt;td&gt;USB recognition failure, J-Link firmware mismatch, board power issue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Serial log&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Board → Runner (step 8)&lt;/td&gt;
&lt;td&gt;Firmware boot messages, test output, Hard Fault dumps&lt;/td&gt;
&lt;td&gt;Init failure, ISR context violation, stack overflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Twister verdict log&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runner (step 9)&lt;/td&gt;
&lt;td&gt;pass/fail results, timeout info&lt;/td&gt;
&lt;td&gt;Pattern mismatch, timeout exceeded&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Reproducing the Build Environment with Docker
&lt;/h3&gt;

&lt;p&gt;The most common CI failure is "it works on my PC but not in CI." The standard solution for Zephyr/NCS projects is Docker.&lt;/p&gt;

&lt;p&gt;Nordic provides an official Docker image called &lt;a href="https://github.com/NordicPlayground/nrf-docker" rel="noopener noreferrer"&gt;&lt;code&gt;nrf-docker&lt;/code&gt;&lt;/a&gt; on &lt;a href="https://hub.docker.com/r/nordicplayground/nrfconnect-sdk" rel="noopener noreferrer"&gt;Docker Hub&lt;/a&gt; (&lt;code&gt;nordicplayground/nrfconnect-sdk&lt;/code&gt;). It contains every dependency needed to run west commands — Zephyr SDK, Python venv, west manifest. You pull this image and use it as the build environment; you're not uploading your code to Docker Hub. It's the same idea as &lt;code&gt;apt install&lt;/code&gt; for the compiler.&lt;/p&gt;

&lt;p&gt;One caveat: this official image is &lt;strong&gt;amd64 (x86_64) only&lt;/strong&gt;. A Raspberry Pi is ARM64 and can't run this image directly. So the CI pipeline splits into two stages:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47s8mdt9iq2mllbdb6ny.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47s8mdt9iq2mllbdb6ny.webp" alt="Split CI pipeline — Docker build on amd64 cloud runner, flash and test on ARM64 Raspberry Pi self-hosted runner" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;CI pipeline splitting amd64 Docker build from ARM64 Raspberry Pi testing&lt;/p&gt;

&lt;p&gt;How project files flow through each stage:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Where&lt;/th&gt;
&lt;th&gt;What&lt;/th&gt;
&lt;th&gt;How&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;git checkout&lt;/td&gt;
&lt;td&gt;Cloud/local&lt;/td&gt;
&lt;td&gt;Full source code&lt;/td&gt;
&lt;td&gt;CI auto-clones from Git repo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Docker pull&lt;/td&gt;
&lt;td&gt;Cloud/local&lt;/td&gt;
&lt;td&gt;Build tools (SDK, compiler)&lt;/td&gt;
&lt;td&gt;Downloads Nordic official image from Docker Hub&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;west build&lt;/td&gt;
&lt;td&gt;Inside Docker container&lt;/td&gt;
&lt;td&gt;Source → zephyr.hex&lt;/td&gt;
&lt;td&gt;ARM cross-compilation (ARM binary built on amd64 host)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Artifact transfer&lt;/td&gt;
&lt;td&gt;CI platform&lt;/td&gt;
&lt;td&gt;zephyr.hex (~hundreds of KB)&lt;/td&gt;
&lt;td&gt;GitHub Actions artifact, GitLab job artifact, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;west flash&lt;/td&gt;
&lt;td&gt;Raspberry Pi&lt;/td&gt;
&lt;td&gt;zephyr.hex → board&lt;/td&gt;
&lt;td&gt;nrfjprog flashes via USB/J-Link&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Twister test&lt;/td&gt;
&lt;td&gt;Raspberry Pi&lt;/td&gt;
&lt;td&gt;Serial logs&lt;/td&gt;
&lt;td&gt;Captures board UART output, pattern matches&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A single-stage architecture where the Raspberry Pi handles both build and flash without Docker is also viable. You'd install Zephyr SDK and west directly on the Pi. Build times are 3-5x slower than amd64, but the pipeline is simpler. I started with this single-stage setup since my project is small, and I'll switch to the split architecture if build time becomes a bottleneck.&lt;/p&gt;

&lt;p&gt;The CI YAML for the split Docker architecture looks like this (GitHub Actions example):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/hil-test.yml — split build/test architecture&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HIL Test (Split)&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;  &lt;span class="c1"&gt;# cloud runner (amd64)&lt;/span&gt;
    &lt;span class="na"&gt;container&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nordicplayground/nrfconnect-sdk:v2.9-branch&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;west init -l . &amp;amp;&amp;amp; west update&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;west build -b nrf52dk/nrf52832&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/upload-artifact@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;firmware&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;build/zephyr/zephyr.hex&lt;/span&gt;

  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;build&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;self-hosted&lt;/span&gt;  &lt;span class="c1"&gt;# Raspberry Pi (ARM64)&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/download-artifact@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;firmware&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Flash firmware&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nrfjprog --program zephyr.hex --chiperase --verify --reset&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run Twister tests&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;west twister --device-testing --hardware-map hardware-map.yml -T tests/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I pinned my SDK version using the &lt;a href="https://dev.to/erratahunter/ncs-project-management-guide-ditching-global-install-to-reclaim-control-j39"&gt;T2 topology&lt;/a&gt;'s &lt;code&gt;west.yml&lt;/code&gt;, so running &lt;code&gt;west init&lt;/code&gt; and &lt;code&gt;west update&lt;/code&gt; inside the Docker image reproduces the exact same environment as my dev PC. Accessing USB devices from inside a Docker container requires the &lt;code&gt;--device&lt;/code&gt; flag, and its behavior varies subtly across platforms — which is another reason I chose the split architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  HIL CI Works Without T2 Topology Too
&lt;/h3&gt;

&lt;p&gt;The example above assumes T2 topology (a &lt;code&gt;west.yml&lt;/code&gt; manifest at the project root). But HIL CI itself doesn't require T2. All you need is "a buildable project" and "a board to flash."&lt;/p&gt;

&lt;p&gt;The build method in CI varies by project structure:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project Structure&lt;/th&gt;
&lt;th&gt;How to Build in CI&lt;/th&gt;
&lt;th&gt;SDK Version Management&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;T2 topology&lt;/strong&gt; (&lt;code&gt;west.yml&lt;/code&gt; present)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;west init -l . &amp;amp;&amp;amp; west update &amp;amp;&amp;amp; west build&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;west.yml&lt;/code&gt; pins SDK revision — high reproducibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Freestanding&lt;/strong&gt; (local SDK folder, &lt;code&gt;ZEPHYR_BASE&lt;/code&gt; env var)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;export ZEPHYR_BASE=/path/to/sdk &amp;amp;&amp;amp; west build&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pre-install SDK on runner, or clone a specific version in CI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;nRF Connect SDK + VS Code extension&lt;/strong&gt; (GUI-based build)&lt;/td&gt;
&lt;td&gt;Build the same project via CLI: &lt;code&gt;west build -b nrf52dk/nrf52832&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Pin SDK version via env var or Docker image tag&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The simplest way to put a freestanding project into CI is to pre-install the NCS SDK on the runner machine and set &lt;code&gt;ZEPHYR_BASE&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Freestanding project CI example (GitHub Actions)&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;hil-test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;self-hosted&lt;/span&gt;  &lt;span class="c1"&gt;# runner with pre-installed SDK&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;ZEPHYR_BASE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/home/runner/ncs/v2.9.0/zephyr&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;west build -b nrf52dk/nrf52832&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;west twister --device-testing --hardware-map hardware-map.yml -T tests/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The downside: the SDK version is tied to the runner machine. Updating the runner's SDK affects every project. That's exactly why T2 topology uses &lt;code&gt;west.yml&lt;/code&gt; to pin SDK versions independently per project. But if you have a single project and just want to get CI running, freestanding is enough. You can upgrade the structure later.&lt;/p&gt;

&lt;h3&gt;
  
  
  Precedent: Golioth's Implementation
&lt;/h3&gt;

&lt;p&gt;The implementation I referenced most while designing this pipeline was &lt;a href="https://blog.golioth.io/golioth-hil-testing-part1/" rel="noopener noreferrer"&gt;Golioth's HIL case study&lt;/a&gt;. Golioth, an IoT platform company, runs exactly this architecture — Raspberry Pi + GitHub Actions self-hosted runner + nRF52840dk — to execute automated HIL tests on every PR.&lt;/p&gt;

&lt;p&gt;Key design decisions from Golioth:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Record all connected devices in hardware-map.yml.&lt;/strong&gt; Serial port, device ID, platform, and runner info are managed in YAML. When a board is added or swapped, only this file needs updating.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-stage WiFi/cloud credentials on the runner locally.&lt;/strong&gt; No secrets in the repository. Setup files live on the runner machine, and the workflow references them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-detect connected boards.&lt;/strong&gt; They wrote a script that automatically recognizes USB-connected boards and generates the hardware-map.yml. Physically swapping a board is reflected on the next CI run.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I didn't adopt this structure wholesale. Golioth is a cloud service company, so they validate network connectivity, authentication, and OTA (Over-the-Air firmware update) via HIL. My immediate need was simpler: "flash after build, verify basic behavior via serial logs." Scope your automation to match your actual needs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Twister + Real Hardware — Writing and Running Tests
&lt;/h2&gt;

&lt;h3&gt;
  
  
  hardware-map.yml and testcase.yaml
&lt;/h3&gt;

&lt;p&gt;Twister's &lt;code&gt;--device-testing&lt;/code&gt; mode operates on two YAML files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;hardware-map.yml&lt;/strong&gt; — physical board info connected to the runner:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# hardware-map.yml&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;connected&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;000683459357&lt;/span&gt;        &lt;span class="c1"&gt;# J-Link serial number&lt;/span&gt;
  &lt;span class="na"&gt;platform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nrf52dk/nrf52832&lt;/span&gt;
  &lt;span class="na"&gt;product&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;J-Link&lt;/span&gt;
  &lt;span class="na"&gt;runner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nrfjprog&lt;/span&gt;         &lt;span class="c1"&gt;# flashing tool&lt;/span&gt;
  &lt;span class="na"&gt;serial&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/dev/ttyACM0&lt;/span&gt;     &lt;span class="c1"&gt;# serial port&lt;/span&gt;
  &lt;span class="na"&gt;baud&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;115200&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only boards with &lt;code&gt;connected: true&lt;/code&gt; are included as test targets. The J-Link serial number (&lt;code&gt;id&lt;/code&gt;) uniquely identifies each board, so multiple boards on the same runner don't conflict. Twister's hardware map currently supports the &lt;code&gt;pyocd&lt;/code&gt;, &lt;code&gt;nrfjprog&lt;/code&gt;, &lt;code&gt;jlink&lt;/code&gt;, &lt;code&gt;openocd&lt;/code&gt;, and &lt;code&gt;dediprog&lt;/code&gt; runners. Other runners are still in progress.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;testcase.yaml&lt;/strong&gt; — test definition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# tests/my_feature/testcase.yaml&lt;/span&gt;
&lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;my_app.feature.basic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;platform_allow&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;nrf52dk/nrf52832&lt;/span&gt;
    &lt;span class="na"&gt;harness&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;console&lt;/span&gt;
    &lt;span class="na"&gt;harness_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;one_line&lt;/span&gt;
      &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Feature&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;initialized&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;successfully"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Self-test&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;passed"&lt;/span&gt;
    &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;feature&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;hil&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;harness: console&lt;/code&gt; finds regex patterns in serial output to determine pass/fail. If "Feature initialized successfully" appears in the log, it passes. If the pattern doesn't appear within the timeout, it fails. Simple — but it catches more than you'd expect.&lt;/p&gt;

&lt;p&gt;Execution command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;west twister &lt;span class="nt"&gt;--device-testing&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--hardware-map&lt;/span&gt; hardware-map.yml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-T&lt;/span&gt; tests/ &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-vv&lt;/span&gt;  &lt;span class="c"&gt;# verbose output&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Twister automatically builds the firmware, flashes it to the board listed in hardware-map.yml, captures serial output, matches it against testcase.yaml conditions, and reports results. &lt;code&gt;west flash&lt;/code&gt; internally calls &lt;code&gt;nrfjprog&lt;/code&gt;, which uses the J-Link DLL. In headless environments, the process runs without firmware update dialogs.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Serial Logs Can and Can't Catch
&lt;/h3&gt;

&lt;p&gt;"So it just checks whether my predefined log messages appear?" Yes. And that catches more than you'd think.&lt;/p&gt;

&lt;p&gt;When debugging via serial manually, there are two modes: watching logs scroll in real time and checking "this log should appear at this timing," or dumping logs to a file and searching for keywords later. Serial log verification in CI is closer to the latter — capture the entire log, then automatically check whether predefined patterns are present or absent.&lt;/p&gt;

&lt;p&gt;Specific firmware scenarios this simple mechanism catches:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Boot initialization sequence verification&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Firmware typically initializes subsystems in order at boot. BLE stack, then sensor driver, then application logic. Miss a Kconfig option, and a subsystem silently drops out. Manually, you might notice "the log looks shorter than usual" and move on. CI flags it immediately when the "BLE stack initialized" pattern is missing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Initialization sequence testcase&lt;/span&gt;
&lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;boot.init_sequence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;harness&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;console&lt;/span&gt;
    &lt;span class="na"&gt;harness_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;multi_line&lt;/span&gt;
      &lt;span class="na"&gt;ordered&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;[00:00:00.0&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;d+&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;lt;inf&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;app:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;System&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;starting"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;[00:00:00.&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;d+&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;lt;inf&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ble:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;BLE&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;stack&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;initialized"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;[00:00:00.&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;d+&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;lt;inf&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;sensor:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;IMU&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ready"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;[00:00:01.&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;d+&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;lt;inf&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;app:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;All&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;subsystems&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;up"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;type: multi_line&lt;/code&gt; with &lt;code&gt;ordered: true&lt;/code&gt; means the patterns must appear &lt;strong&gt;in this exact order&lt;/strong&gt;. Out of order or missing one — fail. I caught an issue this way when the AI refactored code and inadvertently changed the initialization order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Automatic Hard Fault detection&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Calling &lt;code&gt;k_sleep()&lt;/code&gt; in ISR context or dereferencing a null pointer triggers a Hard Fault on ARM Cortex-M. Zephyr's default Fault Handler dumps registers to serial:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[00:00:01.234] &amp;lt;err&amp;gt; os: ***** HARD FAULT *****
[00:00:01.234] &amp;lt;err&amp;gt; os:   Fault escalation (see below)
[00:00:01.235] &amp;lt;err&amp;gt; os: r0/a1:  0x00000000  r1/a2:  0x20001234
[00:00:01.235] &amp;lt;err&amp;gt; os: Current thread: 0x20000458 (main)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a pattern that &lt;strong&gt;must not appear&lt;/strong&gt;. You can set it as a failure condition in testcase.yaml:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Fail unconditionally on Hard Fault&lt;/span&gt;
&lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;safety.no_hard_fault&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;harness&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;console&lt;/span&gt;
    &lt;span class="na"&gt;harness_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;one_line&lt;/span&gt;
      &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;All&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;self-tests&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;passed"&lt;/span&gt;
      &lt;span class="na"&gt;fail_on_fault&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;  &lt;span class="c1"&gt;# default is true, but stated explicitly for clarity&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If I were watching the serial monitor myself, I'd spot the Hard Fault dump immediately. But without CI, tracing "which of the 5 commits pushed over the weekend broke it" is painful. CI running this test on every commit tells you exactly which commit introduced the fault — no &lt;code&gt;git bisect&lt;/code&gt; needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Timing-based verification&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Zephyr logs include timestamps. This lets you verify timing requirements like "BLE advertising must start within 2 seconds of boot":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Verify advertising starts within 3 seconds of boot&lt;/span&gt;
&lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ble.adv_start_timing&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;harness&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;console&lt;/span&gt;
    &lt;span class="na"&gt;harness_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;one_line&lt;/span&gt;
      &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;[00:00:0[0-2]&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;d+&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;lt;inf&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ble:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Advertising&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;started"&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The regex &lt;code&gt;[00:00:0[0-2]\\.\\d+]&lt;/code&gt; only matches timestamps between 0 and 2 seconds. If advertising starts after 3 seconds, the pattern doesn't match, and the test times out as a failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Memory usage regression detection&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Enabling Zephyr's &lt;a href="https://docs.zephyrproject.org/latest/services/debugging/thread-analyzer.html" rel="noopener noreferrer"&gt;&lt;code&gt;CONFIG_THREAD_ANALYZER&lt;/code&gt;&lt;/a&gt; periodically logs each thread's stack usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[00:00:05.000] &amp;lt;inf&amp;gt; thread_analyzer:  main    : STACK: unused 512 usage 1536 / 2048 (75 %); CPU: 12 %
[00:00:05.000] &amp;lt;inf&amp;gt; thread_analyzer:  ble_rx  : STACK: unused 128 usage 896 / 1024 (87 %); CPU: 3 %
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"unused 128" means only 128 bytes of stack headroom remain. You can pattern-match this and fail when headroom drops below a threshold — catching stack growth early as the AI adds code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this approach can't catch&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Serial log pattern matching only verifies "logs I predicted in advance." Unexpected failures — BLE disconnecting after 30 minutes, sensor values drifting at certain temperatures — won't be caught unless you build tests that reproduce those specific conditions.&lt;/p&gt;

&lt;p&gt;Real-time interactive debugging is also outside CI's scope. "Watch serial output while pressing a button at a specific moment" is still a desk job. CI's role is "automatically re-verify known correct behavior on every commit," not "discover new problems." When you do discover a new problem, you write a test for it and add it to CI — that's how test suites naturally grow thicker over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automatable Tests vs. Non-automatable Tests
&lt;/h3&gt;

&lt;p&gt;Not everything can be automated with HIL. Drawing the boundary clearly matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automatable:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;UART/RTT log output verification (string pattern matching)&lt;/li&gt;
&lt;li&gt;State machine transition checks (log state changes, verify sequence)&lt;/li&gt;
&lt;li&gt;Boot time measurement (timestamp-based)&lt;/li&gt;
&lt;li&gt;I2C/SPI device response checks (when sensors are physically connected)&lt;/li&gt;
&lt;li&gt;Memory usage reports (parsing the .map file generated at build time)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Difficult or impossible to automate:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;BLE RF performance (RSSI, packet error rate) — requires dedicated test equipment&lt;/li&gt;
&lt;li&gt;Analog sensor accuracy — requires a reference input source&lt;/li&gt;
&lt;li&gt;Power consumption measurement — requires a current probe (Zephyr 4.2 added a power measurement harness to Twister, but it needs physical measurement hardware)&lt;/li&gt;
&lt;li&gt;Long-duration stress tests — hits CI execution time limits&lt;/li&gt;
&lt;li&gt;UI/display output — camera-based verification is possible but complex (&lt;a href="https://www.zephyrproject.org/zephyr-4-3-is-here-whats-new/" rel="noopener noreferrer"&gt;Zephyr 4.3 added visual fingerprint matching&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I focused on the "automatable" list. The most common failure patterns in AI-generated code — boot initialization failures, features silently disabled by wrong Kconfig, Hard Faults from ISR context violations — are all catchable via serial logs. Aiming for perfection means never starting. "Automatically catching 80% of the most common failures" is the realistic goal.&lt;/p&gt;




&lt;h2&gt;
  
  
  Plugging CI into the AI Workflow — Closing the Loop
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Research, Plan, Execute, Test, CI: The Final Workflow
&lt;/h3&gt;

&lt;p&gt;Adding CI to the four-stage loop from &lt;a href="https://reversetobuild.com/ai-firmware-development-workflow/" rel="noopener noreferrer"&gt;series #4&lt;/a&gt; produces this workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1y06adht4vsp37rf5otf.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1y06adht4vsp37rf5otf.webp" alt="Complete AI firmware development workflow — Research, Plan, Execute, PR, CI Pipeline with HIL testing, and AI feedback loop on failure" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;HIL CI joins the four-stage AI loop, forming a closed feedback loop&lt;/p&gt;

&lt;p&gt;Creating a PR (Pull Request) triggers CI automatically. A build failure surfaces the build log; a test failure surfaces the serial log. Standard CI so far.&lt;/p&gt;

&lt;h3&gt;
  
  
  Feeding CI Failure Logs Back to the AI
&lt;/h3&gt;

&lt;p&gt;The differentiator is the feedback loop on failure. When CI fails, I pass the serial logs to the AI for root cause analysis and fix suggestions.&lt;/p&gt;

&lt;p&gt;A finding from &lt;a href="https://reversetobuild.com/ai-firmware-development-workflow/" rel="noopener noreferrer"&gt;series #4&lt;/a&gt;: "AI's accuracy is highest when analyzing logs." Logs are factual data, which leaves little room for hallucination. The same applies to CI-captured serial logs. Hand the AI a Hard Fault register dump, stack trace, and error codes, and it provides reasonably accurate analysis: "this address corresponds to this function at this offset, and the probable cause is X."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Workflow example: save logs on CI failure (GitHub Actions)&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Save failure logs&lt;/span&gt;
  &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;failure()&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;cp twister-out/*/handler.log artifacts/&lt;/span&gt;
    &lt;span class="s"&gt;cp twister-out/*/device.log artifacts/&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Upload artifacts&lt;/span&gt;
  &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;failure()&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/upload-artifact@v4&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;failure-logs&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;artifacts/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Feeding the saved logs to Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Request AI analysis of failure logs locally&lt;/span&gt;
claude &lt;span class="s2"&gt;"This Twister test failed in CI. Analyze device.log."&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  @artifacts/device.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This loop isn't fully automated yet. There's manual intervention between CI failure, log download, and handing it to the AI. Tools like &lt;a href="https://about.gitlab.com/blog/developing-gitlab-duo-blending-ai-and-root-cause-analysis-to-fix-ci-cd/" rel="noopener noreferrer"&gt;GitLab Duo Root Cause Analysis&lt;/a&gt; are narrowing this gap, but no production tool yet auto-analyzes embedded firmware serial logs. [TBD: Need to add concrete experience of the CI failure → AI analysis → fix application cycle]&lt;/p&gt;

&lt;h3&gt;
  
  
  Reusing Skills and Hooks in CI
&lt;/h3&gt;

&lt;p&gt;The Kconfig validation hook from &lt;a href="https://reversetobuild.com/claude-code-embedded-firmware-development/" rel="noopener noreferrer"&gt;series #3&lt;/a&gt; — a script that greps &lt;code&gt;build/zephyr/.config&lt;/code&gt; and Kconfig sources to catch nonexistent symbols when a &lt;code&gt;.conf&lt;/code&gt; file is modified — also works in CI.&lt;/p&gt;

&lt;p&gt;The approach is straightforward. Include the hook script in the repo and run it before the build step in the CI workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run Kconfig validation hook in CI&lt;/span&gt;
- name: Validate Kconfig
  run: |
    west build &lt;span class="nt"&gt;-b&lt;/span&gt; nrf52dk/nrf52832
    ./scripts/validate_kconfig.sh prj.conf build/zephyr/.config
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code's skill fires when the AI modifies a &lt;code&gt;.conf&lt;/code&gt; file; the CI validation catches it when a human edits &lt;code&gt;.conf&lt;/code&gt; manually too. The same validation logic, running at two points. Tools created during AI collaboration naturally extending into CI infrastructure — that's the compounding effect of the pipeline built across this series.&lt;/p&gt;




&lt;h2&gt;
  
  
  Remaining Gaps and Next Steps
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What HIL CI Still Can't Catch
&lt;/h3&gt;

&lt;p&gt;I need to be honest. Adding HIL CI doesn't mean every hardware problem is automatically caught:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RF performance:&lt;/strong&gt; BLE connection stability, RSSI, and packet error rate require measurement equipment (sniffer, spectrum analyzer). Serial logs only tell you "connection succeeded/failed," not "why it failed."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-term stability:&lt;/strong&gt; Memory leaks and stack overflows only surface after hours or days of operation. CI workflows typically run for minutes to tens of minutes — too short to catch these.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Power consumption:&lt;/strong&gt; Current profiles of sleep/wake cycles can't be measured without a current probe. &lt;a href="https://docs.zephyrproject.org/latest/releases/release-notes-4.2.html" rel="noopener noreferrer"&gt;Zephyr 4.2 added a power measurement harness to Twister&lt;/a&gt;, but it requires physical measurement hardware on the runner.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-device interaction:&lt;/strong&gt; BLE Central-Peripheral communication and mesh network behavior require controlling multiple boards simultaneously. Possible, but setup complexity escalates sharply.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just as I noted in series #4 that "security-related code (encryption, Secure Boot, OTA signing) stays manually written," HIL CI also requires consciously defining the boundary between "what to automate" and "what a human verifies."&lt;/p&gt;

&lt;h3&gt;
  
  
  The Real Costs
&lt;/h3&gt;

&lt;p&gt;Maintaining a HIL CI pipeline has costs. I won't sugarcoat them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Minimum hardware:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Raspberry Pi 4 (~$55) + SD card + power adapter&lt;/li&gt;
&lt;li&gt;nRF52 DK ($40) + USB cable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total: ~$100&lt;/strong&gt; (one-time)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Hidden operational costs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OS updates, security patches — neglect these and you have a security hole&lt;/li&gt;
&lt;li&gt;SD card lifespan — heavy writes mean replacement every 1-2 years&lt;/li&gt;
&lt;li&gt;USB connection instability — the board occasionally drops off and requires a physical reconnect&lt;/li&gt;
&lt;li&gt;GitHub was expected to introduce a $0.002/min platform fee for self-hosted runners on private repos starting March 2026, but community pushback led to an indefinite postponement. Worth watching for future changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For individuals or small teams running fewer than 1,000 builds per month, the cloud hosting cost savings are negligible. But if you've ever lost half a day to "the build passed but the board doesn't work," the $100 upfront investment pays for itself. Measure the value not in dollars, but in time and trust.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reflecting on Five Posts
&lt;/h3&gt;

&lt;p&gt;This post wraps up the technical content of the series. Here's the pipeline built across all five posts at a glance:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Post&lt;/th&gt;
&lt;th&gt;Pipeline Layer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;a href="https://reversetobuild.com/bare-metal-zephyr-antigravity-setup/" rel="noopener noreferrer"&gt;Antigravity IDE&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Development environment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;a href="https://reversetobuild.com/ncs-freestanding-t2-t3-guide/" rel="noopener noreferrer"&gt;NCS T2 Topology&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Project structure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;a href="https://reversetobuild.com/claude-code-embedded-firmware-development/" rel="noopener noreferrer"&gt;Claude Code Skills + Hooks&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;AI tooling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;a href="https://reversetobuild.com/ai-firmware-development-workflow/" rel="noopener noreferrer"&gt;Research → Plan → Execute → Test Loop&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;AI workflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;HIL CI (this post)&lt;/td&gt;
&lt;td&gt;Automated verification&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Environment, structure, tooling, methodology, verification. Each layer stands on the one below it. The IDE isolates projects via T2 topology. Claude Code skills and hooks catch AI hallucinations on that foundation. The four-stage loop structures the workflow. And HIL CI verifies it all on real hardware.&lt;/p&gt;

&lt;p&gt;I know this setup isn't perfect. But going from "I tried having AI write firmware and it didn't work" to "a repeatable process for building firmware with AI" — that's real progress.&lt;/p&gt;

&lt;p&gt;The next post will look back at the entire five-post journey and distill what I learned at the intersection of AI and embedded firmware development — what worked, and what remains firmly in the human domain.&lt;/p&gt;

</description>
      <category>zephyr</category>
      <category>firmware</category>
      <category>hilci</category>
      <category>twister</category>
    </item>
    <item>
      <title>How I Build Firmware with AI — A Research, Plan, Execute, Test Loop in Practice</title>
      <dc:creator>Errata Hunter</dc:creator>
      <pubDate>Sat, 11 Apr 2026 21:44:44 +0000</pubDate>
      <link>https://dev.to/erratahunter/how-i-build-firmware-with-ai-a-research-plan-execute-test-loop-in-practice-178b</link>
      <guid>https://dev.to/erratahunter/how-i-build-firmware-with-ai-a-research-plan-execute-test-loop-in-practice-178b</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tell an AI "implement this" in firmware and you get nonexistent register addresses and ISR-incompatible APIs that pass the build but brick the board.&lt;/li&gt;
&lt;li&gt;A 4-stage loop — research, plan, execute, test — with two human gates (datasheet cross-check, design review) stops bad information from propagating into code.&lt;/li&gt;
&lt;li&gt;AI output needs human verification during research and planning, but for log analysis AI is faster than any human — calibrate AI involvement per stage.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;I expected AI coding tools to boost my productivity. Half right, half wrong. I develop &lt;a href="https://www.zephyrproject.org/" rel="noopener noreferrer"&gt;Zephyr RTOS&lt;/a&gt;-based firmware for nRF52/nRF53 using Claude Code as my primary tool, and the first few weeks actually made things worse. &lt;a href="https://reversetobuild.com/claude-code-embedded-firmware-development/" rel="noopener noreferrer"&gt;As I covered in a previous post&lt;/a&gt;, the AI confidently recommended Kconfig symbols that don't exist, generated register settings off by a single bit from the datasheet, and wrote code calling APIs that must never run in interrupt context.&lt;/p&gt;

&lt;p&gt;The problem wasn't the AI's capability — it was how I used it. Copy-pasting AI output without verification might work for web frontends, but in firmware it's the fastest way to brick a board. After a month of trial and error, I settled on a &lt;strong&gt;research → plan → execute → test&lt;/strong&gt; loop. I don't start firmware work without it now.&lt;/p&gt;

&lt;p&gt;This is a field report on what I delegate to AI at each stage, where I intervene personally, and which pitfalls are specific to the firmware domain.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why "Move Fast and Fix Things" Doesn't Work in Firmware
&lt;/h2&gt;

&lt;p&gt;In web development, you edit code and hot reload gives you instant feedback. Something breaks, the browser console tells you, you fix it. The feedback loop runs in seconds.&lt;/p&gt;

&lt;p&gt;Firmware is different. A wrong clock configuration can render the MCU unresponsive. Misconfigure a single GPIO pin and overcurrent can physically damage external circuitry. Miss a watchdog timer setup and the device enters an infinite reset loop — tracking down the cause means connecting a &lt;a href="https://www.segger.com/products/debug-probes/j-link/" rel="noopener noreferrer"&gt;J-Link&lt;/a&gt; debugger and stepping through the boot sequence line by line. The cost of "just try it and fix later" is in a different league from the web.&lt;/p&gt;

&lt;p&gt;Three reasons AI is particularly dangerous in this domain:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, hallucinations pass compilation.&lt;/strong&gt; When an LLM generates a nonexistent register address or incorrect bit mask, the C compiler treats it as a constant. The build succeeds. The problem only surfaces when you flash the board. In web development, calling a nonexistent API triggers an immediate runtime error. In firmware, "silent failures" are far more common.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second, register maps differ between variants in the same chip family.&lt;/strong&gt; &lt;a href="https://www.nordicsemi.com/Products/nRF52832" rel="noopener noreferrer"&gt;nRF52832&lt;/a&gt; and &lt;a href="https://www.nordicsemi.com/Products/nRF52840" rel="noopener noreferrer"&gt;nRF52840&lt;/a&gt; are both nRF52 series, but their peripheral configurations differ. When AI sees nRF52832 code in its training data and applies it directly to an nRF52840 target, the build passes but the hardware doesn't work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third, code generation without domain context can produce fundamentally wrong patterns.&lt;/strong&gt; I've seen AI write a UART receive handler using dynamic memory allocation and callback chains. Reasonable in Linux userspace, but putting &lt;code&gt;malloc&lt;/code&gt; in a UART handler running in ISR context on an MCU with 256KB of RAM leads to crashes at unpredictable times. A static ring buffer is the right answer, but AI proposes the pattern it's seen most.&lt;/p&gt;

&lt;p&gt;Using AI in this environment requires a structure: provide sufficient context before generation, and verify the output after. That's the starting point of the 4-stage loop I've built.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 4-Stage Loop
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvl974smlsn9114xt0wh1.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvl974smlsn9114xt0wh1.webp" alt="Four-stage firmware development loop with two human verification gates between Research→Plan and Plan→Execute" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AI generates artifacts at each stage; humans verify at two gates.&lt;/p&gt;

&lt;p&gt;The critical elements are &lt;strong&gt;two human gates&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gate 1 (between Research and Plan):&lt;/strong&gt; I cross-check the AI's research output against original documentation. If a hallucination slips through here, the bad information propagates into the plan and then into the code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gate 2 (between Plan and Execute):&lt;/strong&gt; I review the code snippets and constraints in the AI-generated plan. Wrong init priority ordering or blocking API calls inside ISR handlers must be caught at this gate. Design errors that pass this point manifest as "build succeeds, flash succeeds, but crashes under specific conditions" — the worst debugging scenario.&lt;/p&gt;

&lt;p&gt;Compared to the web's "code → hot reload → verify" loop, this has more steps. But in firmware, the time cost of build → flash → hardware verification is so long that "discovering bad code late" is far more expensive than "planning carefully up front." The 4-stage loop reflects that cost structure.&lt;/p&gt;

&lt;p&gt;Every stage produces a &lt;code&gt;.md&lt;/code&gt; file. Not a chat response that vanishes when the session ends, but a document that persists in the file system. If the session disconnects or the context window resets, I reload the previous stage's artifact and pick up where I left off. This "persistent document chain" is the infrastructure that holds the loop together.&lt;/p&gt;




&lt;h2&gt;
  
  
  Research — What Happens When You Feed 200 Pages of Datasheet to AI
&lt;/h2&gt;

&lt;p&gt;This stage has the best time-to-value ratio of all four. When I need to work with a new peripheral, I ask AI for a structured summary instead of reading the datasheet cover to cover.&lt;/p&gt;

&lt;p&gt;A common mistake here: &lt;strong&gt;feeding the entire datasheet PDF to the AI.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;MCU datasheets typically run 200–800 pages. The &lt;a href="https://www.nordicsemi.com/Products/nRF5340" rel="noopener noreferrer"&gt;nRF5340&lt;/a&gt; Product Specification alone is hundreds of pages. Dumping all of it into context burns a significant number of input tokens. The bigger problem: with hundreds of pages loaded at once, the AI loses focus on the relevant section and starts pulling patterns from unrelated information.&lt;/p&gt;

&lt;p&gt;My approach: &lt;strong&gt;feed it section by section.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If I need to implement an I2C driver, I extract just the I2C (TWI/TWIM) chapter from the datasheet. "Read only Section 6.13 TWIM from this PDF and organize the following items." I add the register map table and timing diagram pages if needed. This reduces token cost while narrowing the AI's focus, improving accuracy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Principles for AI Research Tasks
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Explicitly ask for deep analysis.&lt;/strong&gt; Skip this and the AI returns a surface-level paraphrase of the first paragraph. The prompt structure I actually use looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Deep-dive into this MCU's TWIM (I2C Master) peripheral on the following points:
1. Init sequence — full order from clock enable to first transaction
2. Per-register bit field meanings — especially FREQUENCY, ADDRESS, ERRORSRC
3. Whether DMA setup is required or manual byte transfer is possible
4. Clock stretching support and timeout configuration
5. Any discrepancies between the official SDK [nrfx_twim](https://github.com/NordicSemiconductor/nrfx) driver and the datasheet
6. Trade-offs of each approach (DMA vs interrupt-driven, polling vs event-driven)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Ask about trade-offs from the research stage.&lt;/strong&gt; DMA frees the CPU but consumes a DMA channel and adds configuration complexity. Interrupt-driven is simpler to implement but increases CPU load at high communication speeds. Gathering this decision material during research speeds up decision-making during the planning stage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Save results to a &lt;code&gt;.md&lt;/code&gt; file.&lt;/strong&gt; I instruct: "Save the research results to research.md. Include code snippets (register setup examples, SDK API call patterns) for each item." Chat responses disappear when the session ends. A &lt;code&gt;.md&lt;/code&gt; file can be reloaded as context for the planning stage, and it's easy to cross-check against the original datasheet side by side.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gate 1: Human Cross-Verification
&lt;/h3&gt;

&lt;p&gt;The most important action at this stage: &lt;strong&gt;comparing the AI's summary against the original datasheet.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the AI reports "the TWIM FREQUENCY register value 0x06400000 corresponds to 400kHz," I verify that value directly in the datasheet's register map table. In my experience, AI gets register addresses and bit field values wrong roughly 10–15% of the time. Most errors come from mixing data between similar chip variants. Skip this gate, and incorrect register values propagate through the plan into actual code, manifesting as I2C communication failures on the board. Tracking that down might require an oscilloscope.&lt;/p&gt;

&lt;p&gt;The research review takes me 15–30 minutes. Reading the datasheet from scratch without AI would take 2–3 hours. AI summary + cross-check in under 30 minutes. That time saving is the primary reason I use AI for research.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fruz6df2mwsvbup5otfr5.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fruz6df2mwsvbup5otfr5.webp" alt="Datasheet feeding strategy comparison — full 700-page PDF input versus section-by-section approach showing accuracy and token cost tradeoffs" width="800" height="597"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feeding the full datasheet raises token cost and lowers accuracy. Section-by-section is the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Planning — Agree on the Design Before Writing Code
&lt;/h2&gt;

&lt;p&gt;After research, the urge to start coding is strong. Resisting that urge is the second key to this workflow.&lt;/p&gt;

&lt;p&gt;In the planning stage, I ask the AI: "Using the research document as reference, plan which files to modify, in what order, using which APIs." I always specify a few things explicitly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Checklist Format
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Implementation Plan: TWIM I2C Driver&lt;/span&gt;

&lt;span class="gu"&gt;### Constraints&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Only &lt;span class="sb"&gt;`k_sem_give()`&lt;/span&gt; allowed in TWIM ISR, &lt;span class="sb"&gt;`k_malloc()`&lt;/span&gt; forbidden
&lt;span class="p"&gt;-&lt;/span&gt; init priority: TWIM at POST_KERNEL level, after device default priority (40)
&lt;span class="p"&gt;-&lt;/span&gt; I2C bus shared by 2 sensors → mutex required

&lt;span class="gu"&gt;### Implementation Items&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; [ ] 1. Enable TWIM node in Devicetree overlay
&lt;span class="p"&gt;-&lt;/span&gt; [ ] 2. Add &lt;span class="sb"&gt;`CONFIG_I2C`&lt;/span&gt;, &lt;span class="sb"&gt;`CONFIG_NRFX_TWIM0`&lt;/span&gt; to Kconfig
&lt;span class="p"&gt;-&lt;/span&gt; [ ] 3. Write i2c_wrapper.h — define init, read, write APIs
&lt;span class="p"&gt;-&lt;/span&gt; [ ] 4. Implement i2c_wrapper.c — nrfx_twim based, mutex-protected
&lt;span class="p"&gt;-&lt;/span&gt; [ ] 5. Switch sensor A driver to use i2c_wrapper calls
&lt;span class="p"&gt;-&lt;/span&gt; [ ] 6. Build verification and basic I2C scan test
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Checkboxes (&lt;code&gt;- [ ]&lt;/code&gt;) serve a purpose. During execution, I tell the AI "implement item 1 and mark the checkbox as [x]." When a session breaks or I resume the next day, opening this &lt;code&gt;.md&lt;/code&gt; file immediately shows what's done and where things stalled.&lt;/p&gt;

&lt;h3&gt;
  
  
  Include Code Snippets
&lt;/h3&gt;

&lt;p&gt;"Add TWIM node to the &lt;a href="https://docs.zephyrproject.org/latest/build/dts/index.html" rel="noopener noreferrer"&gt;Devicetree&lt;/a&gt; overlay" alone is unreviewable. I have the AI write actual code snippets at the planning stage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/* Before: no TWIM node in app.overlay */

/* After */
&amp;amp;i2c0 {
    compatible = "nordic,nrf-twim";
    status = "okay";
    pinctrl-0 = &amp;lt;&amp;amp;i2c0_default&amp;gt;;
    pinctrl-1 = &amp;lt;&amp;amp;i2c0_sleep&amp;gt;;
    pinctrl-names = "default", "sleep";
    clock-frequency = &amp;lt;I2C_BITRATE_FAST&amp;gt;;  /* 400 kHz */
};
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lets me check specifics during review: "Does the pinctrl name match the actual board DTS?", "Is &lt;code&gt;I2C_BITRATE_FAST&lt;/code&gt; supported on this chip?" You can't review an abstract plan. You can review a code snippet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Record Trade-offs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;### Trade-off Analysis&lt;/span&gt;
| Option | Pros | Cons |
|--------|------|------|
| nrfx_twim (HAL) | Direct control, minimal overhead | No Zephyr DTS integration |
| Zephyr i2c API | DTS auto-binding, portable | Abstraction layer overhead |
→ &lt;span class="gs"&gt;**Choice: Zephyr i2c API**&lt;/span&gt; — sensor drivers already use Zephyr APIs, so compatibility wins.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This record pays off when future-me asks "why did I do it this way?" A few weeks later, when a performance issue prompts considering a switch to nrfx_twim, the decision context is right there in the &lt;code&gt;.md&lt;/code&gt; file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gate 2: Human Design Review
&lt;/h3&gt;

&lt;p&gt;Three points I focus on during plan review:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.zephyrproject.org/latest/kernel/drivers/index.html" rel="noopener noreferrer"&gt;Init priority&lt;/a&gt; ordering:&lt;/strong&gt; Wrong driver init order in Zephyr causes null pointer dereferences at boot. AI frequently overlooks this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ISR context constraints:&lt;/strong&gt; AI often fails to distinguish APIs callable from interrupt handlers vs. thread context. &lt;a href="https://docs.zephyrproject.org/latest/kernel/services/synchronization/mutexes.html" rel="noopener noreferrer"&gt;&lt;code&gt;k_mutex_lock()&lt;/code&gt; cannot be used in ISR&lt;/a&gt; — catch it here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared resources:&lt;/strong&gt; Missing mutex protection on a shared I2C bus, incorrect SPI CS pin management.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Design errors that pass this gate produce "build success, flash success, but crash under specific conditions" — the worst scenario. Timing-dependent bugs are hard to reproduce and can eat half a day tracking down. Thirty minutes of careful plan review saves four hours of debugging.&lt;/p&gt;




&lt;h2&gt;
  
  
  Execution — One Item at a Time, Check After Each
&lt;/h2&gt;

&lt;p&gt;Once the plan is approved, I have the AI write code. The principle is simple: &lt;strong&gt;execute one item, verify the build, mark it complete, then move to the next.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Me: "Implement item 1 from plan.md. Mark the checkbox [x] when done."
AI: (modifies Devicetree overlay, marks checkbox)
Me: west build → success confirmed
Me: "Implement item 2."
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Applying multiple changes at once in firmware makes build errors extremely hard to trace. When Kconfig and source code changes land simultaneously, just separating "is this a config problem or a code problem?" wastes time. One-item-at-a-time execution narrows error causes to exactly one change.&lt;/p&gt;

&lt;h3&gt;
  
  
  When Build Errors Occur
&lt;/h3&gt;

&lt;p&gt;Zephyr/west build errors are notoriously unfriendly. CMake configuration errors, Kconfig dependency conflicts, Devicetree binding mismatches, and linker errors pour out in dozens of log lines. This is where AI excels.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Paste the full error log.&lt;/strong&gt; Not "I got a build error" — copy the entire terminal output. AI extracts the actual error line from verbose CMake traces and pinpoints causes like "this error is a dependency conflict because &lt;code&gt;CONFIG_I2C&lt;/code&gt; is enabled but &lt;code&gt;CONFIG_GPIO&lt;/code&gt; is missing." I use AI to identify the error category; I decide the actual fix based on the plan's context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Document Execution History
&lt;/h3&gt;

&lt;p&gt;I record build errors, workarounds, and unexpected behavior in a &lt;code&gt;.md&lt;/code&gt; file. Short entries like "item 3: &lt;code&gt;CONFIG_NRFX_TWIM0&lt;/code&gt; deprecated, used &lt;code&gt;CONFIG_I2C_NRFX_TWIM&lt;/code&gt; instead."&lt;/p&gt;

&lt;p&gt;This record pays off in two situations. First, when a similar project hits the same issue, I hand the past record to AI and it gets "we solved this before" context immediately. Second, when the context window resets after a long conversation, reloading the execution log &lt;code&gt;.md&lt;/code&gt; restores the current state.&lt;/p&gt;




&lt;h2&gt;
  
  
  Testing — Paste Logs, AI Debugs
&lt;/h2&gt;

&lt;p&gt;The reality of firmware testing: no matter how thorough the unit tests, on-board verification is the final check. All I2C driver unit tests can pass, but if clock stretching timeout hits during actual sensor communication, those unit tests mean nothing.&lt;/p&gt;

&lt;p&gt;When problems occur, my most-used pattern: &lt;strong&gt;paste the entire log into AI.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I copy runtime logs collected via UART or RTT and ask "analyze the root cause from this log." Here's an example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;[00:00:01.234] &amp;lt;inf&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;twim: TWIM init OK, &lt;span class="nv"&gt;freq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;400kHz
&lt;span class="gp"&gt;[00:00:01.240] &amp;lt;inf&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;sensor_a: Starting I2C &lt;span class="nb"&gt;read&lt;/span&gt;, &lt;span class="nv"&gt;addr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0x48
&lt;span class="gp"&gt;[00:00:01.245] &amp;lt;wrn&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;twim: TWIM event: &lt;span class="nv"&gt;ERROR_SRC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0x02 &lt;span class="o"&gt;(&lt;/span&gt;ANACK&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;[00:00:01.245] &amp;lt;err&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;sensor_a: I2C &lt;span class="nb"&gt;read &lt;/span&gt;failed: &lt;span class="nt"&gt;-5&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;EIO&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;[00:00:01.250] &amp;lt;inf&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;sensor_a: Retry 1/3
&lt;span class="gp"&gt;[00:00:01.255] &amp;lt;wrn&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;twim: TWIM event: &lt;span class="nv"&gt;ERROR_SRC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0x02 &lt;span class="o"&gt;(&lt;/span&gt;ANACK&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;[00:00:01.260] &amp;lt;inf&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;sensor_a: Retry 2/3
&lt;span class="gp"&gt;[00:00:01.265] &amp;lt;wrn&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;twim: TWIM event: &lt;span class="nv"&gt;ERROR_SRC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0x02 &lt;span class="o"&gt;(&lt;/span&gt;ANACK&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;[00:00:01.270] &amp;lt;err&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;sensor_a: All retries exhausted
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI immediately responds: "ERROR_SRC=0x02 is Address NACK. Verify sensor address 0x48. If correct, suspect missing pull-up resistors or wiring issues." A human reading this log reaches the same conclusion, but looking up whether bit 1 of the ERROR_SRC register is ANACK in the datasheet takes 5 minutes. AI does it in 1 second.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.segger.com/products/debug-probes/j-link/technology/about-real-time-transfer/" rel="noopener noreferrer"&gt;RTT (Real-Time Transfer)&lt;/a&gt; logs pair even better with AI than UART. RTT writes directly to a ring buffer in RAM without using any MCU peripheral, so CPU overhead is nearly zero — you can log even in timing-critical sections. Feed AI the ISR timing logs, DMA completion callback ordering, and thread context switch timestamps, and it finds patterns a human would struggle to spot in hundreds of lines: "Interrupts A and B fire in succession with only 8μs between them at this point."&lt;/p&gt;

&lt;p&gt;This is why I consider the testing stage the highest-leverage point for AI in this workflow. During research and planning, AI output requires human verification. But in log analysis, AI is faster than a human, and the margin for error is smaller. Logs are facts, and AI extracts patterns from facts. There's less room for hallucination.&lt;/p&gt;

&lt;p&gt;Limits exist, of course. AI can say "check the pull-up resistors," but picking up a multimeter and measuring resistance is a human job. Capturing SDA/SCL waveforms with a logic analyzer to confirm clock stretching is happening — also human. AI sets the debugging direction, but it cannot replace physical hardware verification.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffr9qmdrx0196jzymf22n.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffr9qmdrx0196jzymf22n.webp" alt="AI effectiveness spectrum across the 4-stage loop — lowest in code generation, highest in log analysis where inputs are factual data" width="800" height="993"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of all four stages, AI delivers the most value during testing. The input is factual data (logs).&lt;/p&gt;




&lt;h2&gt;
  
  
  What Changed and What Didn't
&lt;/h2&gt;

&lt;p&gt;I've used this workflow for over a month. Here's what shifted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What changed:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My role moved from "person who writes code" to "person who makes decisions and verifies." Time spent typing code shrunk. Time spent cross-checking AI output against datasheets and reviewing constraint sections of implementation plans grew.&lt;/p&gt;

&lt;p&gt;Research time dropped by more than half. When working with a new peripheral, I ask AI for a structured summary and cross-check only the critical parts against the original — much faster than reading the datasheet from page one.&lt;/p&gt;

&lt;p&gt;Debugging patterns changed too. I used to read error logs and mentally cycle through possible causes one by one. Now I paste logs into AI, ask for "top 3 probable causes ranked by likelihood," and start verifying from the most likely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What didn't change:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Physical hardware testing remains beyond AI's reach. Verifying waveforms on an oscilloscope, measuring current draw, testing under various temperature conditions — still a human job.&lt;/p&gt;

&lt;p&gt;I treat AI-generated code more conservatively for security-related work. Encryption key management, secure boot chains, OTA signature verification — a single mistake in these areas can compromise the entire product's security. I use AI for research only in this domain; code generation stays manual.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I want to try next:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I'm considering connecting Hardware-in-the-Loop (HIL) testing to the CI pipeline. Attach physical boards to a CI server, automatically build → flash → run basic communication tests on AI-generated code. This would tighten the feedback loop after Gate 2. Still in the infrastructure setup phase, but once this loop is automated, AI utility in firmware development takes another step up.&lt;/p&gt;

&lt;p&gt;AI doesn't replace firmware engineers. It helps firmware engineers make better decisions. But getting that help right requires structurally designing "where AI contributes and where humans intervene." The research → plan → execute → test loop is the current version of that design I've found. I plan to keep refining it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>firmware</category>
      <category>zephyr</category>
    </item>
    <item>
      <title>Embedded Firmware Development with Claude Code — Devicetree, Kconfig, and Debugging</title>
      <dc:creator>Errata Hunter</dc:creator>
      <pubDate>Wed, 25 Mar 2026 21:34:06 +0000</pubDate>
      <link>https://dev.to/erratahunter/embedded-firmware-development-with-claude-code-devicetree-kconfig-and-debugging-4jl7</link>
      <guid>https://dev.to/erratahunter/embedded-firmware-development-with-claude-code-devicetree-kconfig-and-debugging-4jl7</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;75% of Zephyr firmware development time goes to Kconfig/devicetree configuration and build error interpretation, not writing code — Claude Code can intervene in that 75% directly from the terminal.&lt;/li&gt;
&lt;li&gt;Kconfig hallucination (AI inventing nonexistent symbols) is structurally preventable by combining a shell script that greps &lt;code&gt;build/zephyr/.config&lt;/code&gt; and Kconfig source files with Claude Code's skill and hook system.&lt;/li&gt;
&lt;li&gt;Feeding all three files (&lt;code&gt;.dts&lt;/code&gt;, &lt;code&gt;.dtsi&lt;/code&gt;, binding YAML) as context via the &lt;code&gt;@&lt;/code&gt; syntax prevents &lt;code&gt;compatible&lt;/code&gt; string hallucination in devicetree overlays.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;AI coding tools still feel distant for embedded firmware developers. "Can AI even understand devicetree?" "What if it invents Kconfig symbols?" — reasonable doubts. I had them too.&lt;/p&gt;

&lt;p&gt;I deployed &lt;a href="https://code.claude.com/" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; on production &lt;a href="https://docs.zephyrproject.org/latest/" rel="noopener noreferrer"&gt;Zephyr&lt;/a&gt;/NCS firmware projects at my day job. Kconfig hallucination broke builds. I built safeguards using Claude Code's skill and hook system to fix it. This post covers the methodology I developed. I can't share proprietary code, but I can explain where things break when you hand devicetree and Kconfig to AI, and how to structurally prevent it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I Left the GUI IDE for the Terminal
&lt;/h2&gt;

&lt;p&gt;A typical day for a Zephyr firmware developer breaks down like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Run &lt;code&gt;west build&lt;/code&gt; → interpret CMake/Kconfig/devicetree errors&lt;/li&gt;
&lt;li&gt;  Edit &lt;code&gt;prj.conf&lt;/code&gt; → dig through &lt;code&gt;menuconfig&lt;/code&gt; to find the right &lt;code&gt;CONFIG_*&lt;/code&gt; symbols&lt;/li&gt;
&lt;li&gt;  Write devicetree overlays → cross-reference &lt;code&gt;.dts&lt;/code&gt;, &lt;code&gt;.dtsi&lt;/code&gt;, and binding YAML simultaneously&lt;/li&gt;
&lt;li&gt;  Flash and check serial logs → track down bugs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Actual C code writing accounts for roughly 25% of the day. The remaining 75% is spent fighting configuration files and error logs.&lt;/p&gt;

&lt;p&gt;GUI IDE AI features focus on code autocompletion — predicting function signatures, suggesting next lines. They help with the 25%. Claude Code runs in the terminal, so it can execute &lt;code&gt;west build&lt;/code&gt; directly, read the entire build log, interpret errors, and suggest next actions. It greps &lt;code&gt;.config&lt;/code&gt; files and traces Kconfig dependencies back to source. &lt;strong&gt;It can intervene in the other 75%&lt;/strong&gt; — and that's why a terminal-based AI agent has an edge in embedded.&lt;/p&gt;

&lt;p&gt;Dedicated embedded AI tools are emerging. &lt;a href="https://embedder.com/" rel="noopener noreferrer"&gt;Embedder&lt;/a&gt; (YC S25) generates driver code from uploaded PDF datasheets and is preparing serial console and GDB integration. If you work within supported chipsets (STM32, ESP32) using standard workflows, these packaged tools deliver real productivity gains.&lt;/p&gt;

&lt;p&gt;I chose Claude Code instead. The reason comes down to what software engineering calls &lt;strong&gt;the double-edged sword of opinionated design&lt;/strong&gt;. Packaged tools are productive within the Golden Path their creators designed, but friction increases the moment you step outside it. When the tool's assumed build pipeline doesn't match your project structure, you end up contorting your workflow to fit the tool. When the tool's baked-in "best practices" conflict with your hardware's nonstandard constraints, AI suggestions derail rather than help. Embedded development — where hardware configuration, SDK structure, and build pipelines vary project to project — makes this especially pronounced.&lt;/p&gt;

&lt;p&gt;It's a tradeoff between convenience and control. I deal with Zephyr/NCS &lt;a href="https://reversetobuild.com/ncs-freestanding-t2-t3-guide/" rel="noopener noreferrer"&gt;west workspace structures&lt;/a&gt; and per-project build configurations, so I chose to tune a general-purpose tool to fit my pipeline directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Devicetree: Teaching Hardware to AI
&lt;/h2&gt;

&lt;p&gt;Zephyr's &lt;a href="https://docs.zephyrproject.org/latest/build/dts/index.html" rel="noopener noreferrer"&gt;devicetree system&lt;/a&gt; has three layers. SoC &lt;code&gt;.dtsi&lt;/code&gt; files define base hardware. Board &lt;code&gt;.dts&lt;/code&gt; files declare pin mappings. &lt;a href="https://docs.zephyrproject.org/latest/build/dts/bindings.html" rel="noopener noreferrer"&gt;Binding YAML&lt;/a&gt; files specify valid properties for each node. Developers must cross-reference all three layers when writing overlays.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F19h0jsf0tjzx6l6mvduw.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F19h0jsf0tjzx6l6mvduw.webp" alt="Zephyr devicetree three-layer system — SoC dtsi, board dts, and binding YAML flow through overlay to final generated header" width="800" height="174"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To get accurate overlays from AI, you need to feed all three files as context&lt;/p&gt;

&lt;h3&gt;
  
  
  No Context Means Hallucination
&lt;/h3&gt;

&lt;p&gt;Tell Claude Code "add SPI flash to this board" without context and you'll get a plausible but wrong overlay. The &lt;code&gt;compatible&lt;/code&gt; string won't match any actual binding, or it'll reference a nonexistent node label.&lt;/p&gt;

&lt;p&gt;For accurate results, feed all three files explicitly:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Read @boards/arm/nrf52840dk_nrf52840.dts and
@zephyr/dts/arm/nordic/nrf52840.dtsi, then reference
@zephyr/dts/bindings/spi/spi-device.yaml to write
an overlay adding W25Q128 flash to SPI1."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;With the &lt;code&gt;@&lt;/code&gt; syntax pointing to specific files, Claude Code correctly references existing SPI node labels from the board and doesn't miss required properties (&lt;code&gt;reg&lt;/code&gt;, &lt;code&gt;spi-max-frequency&lt;/code&gt;) defined in the binding YAML.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three Patterns Where AI Gets Devicetree Wrong
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;&lt;code&gt;compatible&lt;/code&gt; string hallucination&lt;/strong&gt; — invents a compatible that doesn't exist in any binding. Feeding the binding YAML as context prevents this.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Node address collision&lt;/strong&gt; — &lt;code&gt;@0&lt;/code&gt; must match &lt;code&gt;reg = &amp;lt;0&amp;gt;&lt;/code&gt;, but AI assigns addresses that duplicate existing nodes. Feeding the board &lt;code&gt;.dts&lt;/code&gt; lets it check existing assignments.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Ignoring overlay detection rules&lt;/strong&gt; — Zephyr's build system auto-detects overlays in this order when &lt;code&gt;DTC_OVERLAY_FILE&lt;/code&gt; is unset: &lt;code&gt;socs/&amp;lt;SOC&amp;gt;.overlay&lt;/code&gt; → &lt;code&gt;boards/&amp;lt;BOARD&amp;gt;.overlay&lt;/code&gt; → &lt;code&gt;&amp;lt;BOARD&amp;gt;.overlay&lt;/code&gt; → &lt;code&gt;app.overlay&lt;/code&gt;. If AI creates an overlay with an arbitrary filename, the build system ignores it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The third problem is solved by adding one line to CLAUDE.md:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Devicetree
- Overlay files must be named `app.overlay` or `boards/&amp;lt;BOARD&amp;gt;.overlay`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;AI's role in devicetree work isn't "writing overlays for you." It's &lt;strong&gt;reducing the overhead of cross-referencing three files&lt;/strong&gt; — checking available nodes in &lt;code&gt;.dtsi&lt;/code&gt;, pulling required properties from binding YAML, and combining settings that don't conflict with existing &lt;code&gt;.dts&lt;/code&gt;. The question is whether you do this manually by switching between three files, or hand all three to AI and get it in one pass.&lt;/p&gt;




&lt;h2&gt;
  
  
  Kconfig: Building a Skill to Prevent AI Hallucination
&lt;/h2&gt;

&lt;p&gt;Zephyr's &lt;a href="https://docs.zephyrproject.org/latest/build/kconfig/index.html" rel="noopener noreferrer"&gt;Kconfig system&lt;/a&gt; has thousands of symbols in a tree structure. Enabling &lt;code&gt;CONFIG_BT&lt;/code&gt; auto-activates &lt;code&gt;NET_BUF&lt;/code&gt;. Choosing &lt;code&gt;CONFIG_BT_HCI&lt;/code&gt; under &lt;code&gt;BT_STACK_SELECTION&lt;/code&gt; triggers another dependency chain. Symbols forced on by &lt;code&gt;select&lt;/code&gt;, conditionally enabled by &lt;code&gt;depends on&lt;/code&gt;, and suggested by &lt;code&gt;imply&lt;/code&gt; are intertwined. The Zephyr project itself &lt;a href="https://github.com/zephyrproject-rtos/zephyr/issues/52575" rel="noopener noreferrer"&gt;acknowledges excessive &lt;code&gt;select&lt;/code&gt; usage&lt;/a&gt; and is migrating to &lt;code&gt;depends on&lt;/code&gt; — that's how complex the dependency structure is.&lt;/p&gt;

&lt;p&gt;Ask AI to "create a minimal prj.conf for BLE Central scan only" and you get a plausible result. But it might invent nonexistent symbols or miss required dependencies.&lt;/p&gt;

&lt;p&gt;When I hit this problem at work, I solved it by making incremental requests to Claude Code. I didn't start with "build me a Kconfig hallucination prevention skill." I asked questions one at a time, and automation emerged naturally.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Conversation Flow
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;"Where does the final Kconfig output go after &lt;code&gt;west build&lt;/code&gt;?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;build/zephyr/.config&lt;/code&gt; contains the fully resolved symbol list — thousands of &lt;code&gt;CONFIG_*=y/n&lt;/code&gt; lines. You can also check via &lt;code&gt;menuconfig&lt;/code&gt;/&lt;code&gt;guiconfig&lt;/code&gt;, but this file is the ground truth the build system actually uses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Find the .config file in this project's build output."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Claude Code locates &lt;code&gt;build/zephyr/.config&lt;/code&gt; and shows its contents. A follow-up question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Does this project have a separate kernel source? Is there a separate kernel .config like in Linux?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Unlike the Linux kernel, Zephyr builds the app and RTOS kernel into a single binary. &lt;code&gt;build/zephyr/.config&lt;/code&gt; is the entire system's configuration. There's no separate kernel .config.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Find all the Kconfig source files in the west workspace."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;zephyr/Kconfig&lt;/code&gt;, &lt;code&gt;zephyr/subsys/bluetooth/Kconfig&lt;/code&gt;, Kconfig files in each driver directory — organized into a tree. These source files contain each symbol's &lt;code&gt;depends on&lt;/code&gt;, &lt;code&gt;select&lt;/code&gt;, and &lt;code&gt;help&lt;/code&gt; text. If AI references these when modifying &lt;code&gt;prj.conf&lt;/code&gt;, it can't fabricate nonexistent symbols.&lt;/p&gt;

&lt;p&gt;Here's where a problem arises. The &lt;code&gt;.config&lt;/code&gt; file is thousands of lines. Kconfig source files are scattered across the entire west workspace. Reading everything every time wastes tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I want to reference these files when modifying .conf, but minimize token usage. How can I query only the relevant symbols?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Claude Code proposes a &lt;strong&gt;shell script + skill combination&lt;/strong&gt;. A script that greps for relevant symbols, called by a skill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Build that skill."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A Kconfig reference skill appears under &lt;code&gt;.claude/skills/&lt;/code&gt;. When a &lt;code&gt;.conf&lt;/code&gt; modification is requested:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Grep the built &lt;code&gt;.config&lt;/code&gt; for the current state of relevant symbols&lt;/li&gt;
&lt;li&gt; Extract the &lt;code&gt;depends on&lt;/code&gt;, &lt;code&gt;select&lt;/code&gt;, &lt;code&gt;help&lt;/code&gt; text from Kconfig source&lt;/li&gt;
&lt;li&gt; Use this as context when modifying &lt;code&gt;.conf&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of reading thousands of lines, it extracts only the needed symbols and their dependencies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I want this skill to trigger only when modifying .conf files. Check what needs to change in .claude/."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Claude Code's &lt;a href="https://code.claude.com/docs/en/hooks" rel="noopener noreferrer"&gt;hook system&lt;/a&gt; handles this. Set &lt;code&gt;PostToolUse&lt;/code&gt; with &lt;code&gt;matcher: "Write|Edit"&lt;/code&gt; in &lt;code&gt;.claude/settings.json&lt;/code&gt;, extract the file path from stdin JSON, and conditionally trigger only for &lt;code&gt;.conf&lt;/code&gt; files:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FILE_PATH=$(jq -r '.tool_input.file_path' &amp;lt; /dev/stdin)
if [[ ! "$FILE_PATH" =~ \.conf$ ]]; then
  exit 0  # ignore non-.conf files
fi
## Run Kconfig reference logic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;CLAUDE.md instructions are advisory — AI can ignore them. Hooks are deterministic. They execute without exception. Making Kconfig source reference automatic on every &lt;code&gt;.conf&lt;/code&gt; edit structurally prevents AI from inventing nonexistent symbols.&lt;/p&gt;

&lt;h3&gt;
  
  
  What This Flow Reveals
&lt;/h3&gt;

&lt;p&gt;Six conversations produced a hallucination prevention skill. Asking "build me a Kconfig hallucination prevention skill" upfront wouldn't have worked — you can't design the solution without knowing &lt;code&gt;.config&lt;/code&gt; exists.&lt;/p&gt;

&lt;p&gt;The pattern here is &lt;strong&gt;discovering AI's limitation, then using the same AI to build a tool that fills the gap&lt;/strong&gt;. Don't try to turn Claude Code into an embedded engineer. Instead, as the engineer, identify AI's weak spots and co-build compensating tools. That approach works.&lt;/p&gt;




&lt;h2&gt;
  
  
  Build Error Debugging — Where AI Delivers the Highest ROI
&lt;/h2&gt;

&lt;p&gt;When &lt;code&gt;west build&lt;/code&gt; fails, the terminal floods with errors from CMake, Kconfig, devicetree, the C compiler, and the linker all mixed together. Even experienced engineers spend time just figuring out whether an error is a devicetree problem or a Kconfig problem.&lt;/p&gt;

&lt;p&gt;Feed the entire build log to Claude Code and it classifies errors by category, then traces root causes. Embedded build errors fall into three categories, each with different AI utility.&lt;/p&gt;

&lt;h3&gt;
  
  
  Devicetree Binding Mismatch
&lt;/h3&gt;

&lt;p&gt;When a &lt;code&gt;compatible&lt;/code&gt; string doesn't match any binding YAML, the build system throws an error. The error message is clear enough to solve without AI, but Claude Code finds the correct binding YAML across hundreds of directories under &lt;code&gt;zephyr/dts/bindings/&lt;/code&gt; and proposes a fix in one step. Manually searching takes time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kconfig Dependency Failure — "Silent Failure"
&lt;/h3&gt;

&lt;p&gt;This one is trickier. Set &lt;code&gt;CONFIG_X=y&lt;/code&gt; in &lt;code&gt;prj.conf&lt;/code&gt;, but if &lt;code&gt;CONFIG_X&lt;/code&gt; has &lt;code&gt;depends on CONFIG_Y&lt;/code&gt; and &lt;code&gt;CONFIG_Y=n&lt;/code&gt;, the Kconfig system silently ignores &lt;code&gt;CONFIG_X&lt;/code&gt;. The build succeeds, but the intended feature doesn't work.&lt;/p&gt;

&lt;p&gt;Have Claude Code compare &lt;code&gt;prj.conf&lt;/code&gt; against &lt;code&gt;build/zephyr/.config&lt;/code&gt; and it finds symbols present in &lt;code&gt;prj.conf&lt;/code&gt; but missing from &lt;code&gt;.config&lt;/code&gt;. Tracing the unmet dependency requires Kconfig source — and the Kconfig reference skill from earlier connects here too.&lt;/p&gt;

&lt;h3&gt;
  
  
  Linker Errors
&lt;/h3&gt;

&lt;p&gt;Embedded linker errors are typically RAM/Flash overflow or duplicate symbol definitions. Claude Code reads the linker script (&lt;code&gt;.ld&lt;/code&gt;) and &lt;code&gt;build.map&lt;/code&gt; to identify which object files conflict. For memory overflow, it extracts each section's size from &lt;code&gt;build.map&lt;/code&gt; and suggests reduction priorities.&lt;/p&gt;

&lt;p&gt;Build error debugging is where Claude Code's ROI is highest. Error messages are text, root-cause tracing requires cross-referencing multiple files, and resolution patterns are relatively well-defined.&lt;/p&gt;




&lt;h2&gt;
  
  
  Runtime Debugging: Log Analysis and GDB
&lt;/h2&gt;

&lt;p&gt;After a successful build and flash, a different kind of debugging begins.&lt;/p&gt;

&lt;h3&gt;
  
  
  Serial Log Pattern Analysis
&lt;/h3&gt;

&lt;p&gt;Capture &lt;code&gt;printk&lt;/code&gt; or Zephyr &lt;code&gt;LOG_MODULE&lt;/code&gt; output over serial and feed it to Claude Code. It identifies timestamp intervals, repeating error code patterns, and state changes preceding specific events. Faster than scrolling through hundreds of lines manually.&lt;/p&gt;

&lt;p&gt;Automating the copy-paste step is possible too. &lt;a href="https://lib.rs/crates/serial-mcp-server" rel="noopener noreferrer"&gt;&lt;code&gt;serial-mcp-server&lt;/code&gt;&lt;/a&gt; is a Rust-based &lt;a href="https://code.claude.com/docs/en/mcp" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; server that exposes UART communication as Claude Code tools — &lt;code&gt;list_ports&lt;/code&gt;, &lt;code&gt;open&lt;/code&gt;, &lt;code&gt;read&lt;/code&gt;, &lt;code&gt;write&lt;/code&gt;, &lt;code&gt;close&lt;/code&gt;. It supports STM32, ESP32, and USB-serial converters like CH340 and FTDI. With MCP configured, you can say "open the serial port and read the log" mid-conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  HardFault Analysis
&lt;/h3&gt;

&lt;p&gt;When a HardFault occurs during J-Link + GDB debugging, feed the call stack and register dump to Claude Code. It interprets Cortex-M &lt;a href="https://developer.arm.com/documentation/dui0552/latest/cortex-m3-peripherals/system-control-block/configurable-fault-status-register" rel="noopener noreferrer"&gt;CFSR (Configurable Fault Status Register)&lt;/a&gt; bits and traces the faulting function to build a cause hypothesis.&lt;/p&gt;

&lt;p&gt;Stack overflows, null pointer dereferences, and unaligned memory access — common patterns — are caught accurately. Nondeterministic bugs like timing issues between DMA completion interrupts and the main loop are harder, since logs alone can't reproduce them. AI help has limits there.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limitations and Workarounds
&lt;/h2&gt;

&lt;p&gt;Areas where Claude Code struggles in embedded:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Binary protocol parsing&lt;/strong&gt; — for byte-packed data like BLE custom profiles or proprietary sensor protocols, AI makes frequent errors in bit shifting and endianness handling. Packed struct field offset calculations vary by compiler and target architecture, and AI overlooks these differences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timing-critical interrupt logic&lt;/strong&gt; — when ISR execution time is constrained to microseconds, AI generates functionally correct code but doesn't optimize for execution time. &lt;code&gt;volatile&lt;/code&gt; access ordering, cache line alignment, and compiler barrier insertion remain the engineer's domain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware register maps&lt;/strong&gt; — don't expect AI to know your SoC's register map accurately. It gets the general structure right, but hallucinates on specific bit reset values and reserved bit handling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mitigation Strategies
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Keep hardware specs in CLAUDE.md, but keep it short.&lt;/strong&gt; &lt;a href="https://code.claude.com/docs/en/best-practices" rel="noopener noreferrer"&gt;Claude Code's official best practices&lt;/a&gt; call this "Progressive Disclosure" — don't dump all information, tell AI how to find it. Instead of pasting the entire register map, write "refer to &lt;code&gt;nrf52840.svd&lt;/code&gt; for nRF52840 register details." Long CLAUDE.md files cause AI to ignore instructions.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Hardware
- SoC: [nRF52840](https://www.nordicsemi.com/Products/nRF52840) (Cortex-M4F, 256KB RAM, 1MB Flash)
- Board: nRF52840-DK (PCA10056)
- NCS SDK: v2.9.0
- Register reference: nrf52840.svd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Always provide verification.&lt;/strong&gt; Have AI write code, then run &lt;code&gt;west build&lt;/code&gt; and check the result — it catches its own mistakes. According to Claude Code's official docs, "run tests after every change — this alone increases output quality 2–3x." Unit testing is often impractical in embedded, but &lt;code&gt;west build&lt;/code&gt; pass/fail serves as a first-order verification.&lt;/p&gt;




&lt;h2&gt;
  
  
  Claude Code Configuration Strategy for Embedded Projects
&lt;/h2&gt;

&lt;p&gt;Effective Claude Code configuration for embedded requires separating the roles of CLAUDE.md, skills, and hooks.&lt;/p&gt;

&lt;p&gt;Component&lt;/p&gt;

&lt;p&gt;Purpose&lt;/p&gt;

&lt;p&gt;Execution&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLAUDE.md&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Per-session context (board, SDK, build commands)&lt;/p&gt;

&lt;p&gt;Auto-loaded (advisory)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Domain knowledge, workflows (Kconfig reference, etc.)&lt;/p&gt;

&lt;p&gt;On-demand&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hooks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Non-negotiable rules (.conf validation, etc.)&lt;/p&gt;

&lt;p&gt;Auto-executed (deterministic)&lt;/p&gt;

&lt;h3&gt;
  
  
  CLAUDE.md: Minimal Context Only
&lt;/h3&gt;

&lt;p&gt;CLAUDE.md loads every session. Exclude what AI can infer from code; include only what it can't. Board pin maps, SoC specs, NCS SDK version, and build commands are typical entries.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Build

&lt;ul&gt;
&lt;li&gt;west build -b nrf52840dk/nrf52840 -- -DOVERLAY_CONFIG=overlay-debug.conf&lt;/li&gt;
&lt;li&gt;west flash --runner jlink&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Conventions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Overlay filenames: app.overlay or boards/&amp;lt;BOARD&amp;gt;.overlay&lt;/li&gt;
&lt;li&gt;Kconfig: prj.conf (shared), boards/&amp;lt;BOARD&amp;gt;.conf (board-specific)&lt;/li&gt;
&lt;li&gt;After .conf edits, verify against build/zephyr/.config
&lt;/li&gt;
&lt;/ul&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;


Skills: Isolate Domain Knowledge
&lt;/h3&gt;


&lt;p&gt;Knowledge needed only in specific workflows — like the Kconfig reference skill — belongs in &lt;a href="https://code.claude.com/docs/en/skills" rel="noopener noreferrer"&gt;&lt;code&gt;.claude/skills/&lt;/code&gt;&lt;/a&gt;. Putting everything in CLAUDE.md wastes the context window, and as it grows longer, AI starts ignoring instructions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hooks: Advisory vs. Deterministic
&lt;/h3&gt;

&lt;p&gt;Writing "always reference &lt;code&gt;.config&lt;/code&gt; when editing &lt;code&gt;.conf&lt;/code&gt;" in CLAUDE.md doesn't guarantee compliance — AI can skip it. If a rule must execute without exception, implement it as a hook. Hooks run shell scripts before or after Claude Code's tool usage, independent of AI judgment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build-Debug Cycle
&lt;/h3&gt;

&lt;p&gt;When these components combine, the embedded build-debug cycle looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqu9qaess6w6hsd2qfb7t.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqu9qaess6w6hsd2qfb7t.webp" alt="Zephyr firmware build-debug cycle — three feedback loops (west build error loop, serial log runtime loop, Kconfig hook validation) converging on a single edit point, with Claude Code intervening at error classification and pattern analysis stages" width="800" height="1193"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Three feedback loops converge on a single edit point — Claude Code intervenes at the red and navy nodes&lt;/p&gt;

&lt;p&gt;Hooks auto-validate Kconfig on &lt;code&gt;.conf&lt;/code&gt; edits. AI classifies and traces build errors. AI analyzes serial log patterns. The 75% outside of code writing — that's where AI intervenes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Applying AI to embedded development isn't about how well AI writes code. It's about reducing friction in the other 75% — build configuration, error interpretation, log analysis.&lt;/p&gt;

&lt;p&gt;Claude Code can intervene in that 75%. But embedded's domain specifics (Kconfig hallucination, inaccurate hardware register maps) mean you can't use it out of the box. You need skills and hooks as compensating mechanisms. After building and deploying these at my day job, the loop of using AI to compensate for AI's own limitations works.&lt;/p&gt;

&lt;p&gt;I wrote this post methodology-first because I can't share proprietary code. When I start a personal side project, I plan to apply the same approach and share devicetree overlay sessions, the Kconfig skill in action, and build error debugging logs — with actual code.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>firmware</category>
      <category>zephyr</category>
      <category>claudecode</category>
    </item>
    <item>
      <title>NCS Project Management Guide: Ditching Global Install to Reclaim Control</title>
      <dc:creator>Errata Hunter</dc:creator>
      <pubDate>Sun, 15 Mar 2026 21:37:25 +0000</pubDate>
      <link>https://dev.to/erratahunter/ncs-project-management-guide-ditching-global-install-to-reclaim-control-j39</link>
      <guid>https://dev.to/erratahunter/ncs-project-management-guide-ditching-global-install-to-reclaim-control-j39</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Patching SDK internals in an NCS global install silently breaks every other project on your machine.&lt;/li&gt;
&lt;li&gt;Freestanding (manual &lt;code&gt;ZEPHYR_BASE&lt;/code&gt; binding) → T2 (app owns &lt;code&gt;west.yml&lt;/code&gt; manifest) → T3 (separate manifest repo managing multiple apps and SDK) gives you progressive isolation.&lt;/li&gt;
&lt;li&gt;Start with Freestanding. Move to T2 or T3 only when reproducibility or multi-product needs demand it.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;When you first pick up the &lt;a href="https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/nrf/index.html" rel="noopener noreferrer"&gt;nRF Connect SDK (NCS)&lt;/a&gt;, the natural move is to follow Nordic's Toolchain Manager or VS Code Extension wizard. &lt;a href="https://reversetobuild.com/bare-metal-zephyr-antigravity-setup/" rel="noopener noreferrer"&gt;When I set up my Zephyr dev environment with Antigravity IDE&lt;/a&gt;, I did exactly that. A few button clicks and the &lt;a href="https://zephyrproject.org/" rel="noopener noreferrer"&gt;Zephyr RTOS&lt;/a&gt; core, libraries, and compiler land neatly under &lt;code&gt;C:\ncs\toolchains&lt;/code&gt; or a &lt;code&gt;v2.x.x&lt;/code&gt; folder.&lt;/p&gt;

&lt;p&gt;The problem: &lt;strong&gt;this convenient global install becomes an unmanageable swamp as projects multiply.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Zephyr's ecosystem is well designed. Need to remap board pins? Use an &lt;code&gt;app.overlay&lt;/code&gt; (&lt;a href="https://docs.zephyrproject.org/latest/build/dts/howtos.html" rel="noopener noreferrer"&gt;Devicetree Overlay&lt;/a&gt;). Want to tweak system configuration? Edit &lt;code&gt;prj.conf&lt;/code&gt; (&lt;a href="https://docs.zephyrproject.org/latest/build/kconfig/index.html" rel="noopener noreferrer"&gt;Kconfig&lt;/a&gt;) and override without touching the original source.&lt;/p&gt;

&lt;p&gt;But production firmware development never follows the textbook. Sometimes you have to reach deep into the HAL (Hardware Abstraction Layer) to dodge a chipset errata. Sometimes the only way forward is a monkey patch inside Nordic's &lt;code&gt;nrfxlib&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In a global install, that single hack &lt;strong&gt;silently breaks every other NCS project on your machine&lt;/strong&gt;. On top of that, every new SDK release meant downloading gigabytes onto the main drive all over again.&lt;/p&gt;

&lt;p&gt;Under the banner of convenience, I had lost control of my build environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deep Dive: True Isolation and the Essence of Freestanding
&lt;/h2&gt;

&lt;p&gt;Breaking this cycle required flipping the NCS paradigm. Instead of "the Toolchain Manager owns my SDK," the goal was &lt;strong&gt;"my code picks and isolates its own SDK."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The first concept I ran into was the &lt;strong&gt;&lt;a href="https://docs.zephyrproject.org/latest/develop/application/index.html" rel="noopener noreferrer"&gt;Freestanding Application&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In the Zephyr ecosystem, a &lt;strong&gt;manifest (&lt;code&gt;west.yml&lt;/code&gt;)&lt;/strong&gt; is a dependency recipe file that declares "this project needs Zephyr version X, nRF module version Y." Think of it as the equivalent of Node.js's &lt;code&gt;package.json&lt;/code&gt; or Python's &lt;code&gt;requirements.txt&lt;/code&gt;. One &lt;code&gt;west update&lt;/code&gt; command pulls every source at the exact pinned version.&lt;/p&gt;

&lt;p&gt;Freestanding skips the manifest and any complex topology. It is the most intuitive way to isolate an SDK on a local dev machine. My app lives at &lt;code&gt;D:\Workspace\my_app\&lt;/code&gt;, the SDK sits in a completely separate location, and I bind &lt;code&gt;ZEPHYR_BASE&lt;/code&gt; via an environment variable only in the terminal session where I build.&lt;/p&gt;

&lt;p&gt;This is like mounting just the volumes you need into a Docker container — clean and minimal. It is the lightest starting point for physically separating your project from the SDK.&lt;/p&gt;

&lt;p&gt;But as projects grow, Freestanding hits its limits. That is where Zephyr's official &lt;a href="https://docs.zephyrproject.org/latest/develop/west/workspaces.html" rel="noopener noreferrer"&gt;West Workspace Topology&lt;/a&gt; comes in. Zephyr defines three topologies based on who owns the manifest repository:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;T1 (Zephyr-centric Star):&lt;/strong&gt; Zephyr itself is the manifest. This is what you get from a default &lt;code&gt;west init&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;T2 (App-centric Star):&lt;/strong&gt; Your app is the manifest. The cleanest layout for a single product.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;T3 (Forest):&lt;/strong&gt; A dedicated manifest repo manages multiple apps and the SDK as siblings. Built for multi-product teams.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This post walks through the progression from Freestanding to T2, then T3.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fps0q2ek3vg8iktjltncl.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fps0q2ek3vg8iktjltncl.webp" alt="NCS project isolation progression — from global install to Freestanding, T2 Star, and T3 Forest topology" width="800" height="171"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Step-by-step progression from global install to Freestanding, T2, and T3&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation: Building a Controlled Environment from Scratch
&lt;/h2&gt;

&lt;p&gt;Here is the step-by-step journey from the simplest Freestanding setup, through T2, to T3 Topology.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Pure Freestanding — The Lightest Isolation
&lt;/h3&gt;

&lt;p&gt;First, I stepped out of the Toolchain Manager's shadow and installed the toolchain and &lt;a href="https://docs.zephyrproject.org/latest/develop/west/index.html" rel="noopener noreferrer"&gt;West (Zephyr's meta-tool)&lt;/a&gt; directly inside a Python virtual environment (venv).&lt;/p&gt;

&lt;p&gt;Open a terminal at the app directory and manually bind the SDK dependency. On Windows, a single script call does it:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;:: Temporarily bind the system NCS (Zephyr) directory to this session only
&amp;gt; C:\ncs\v2.x.x\zephyr\zephyr-env.cmd

:: Now the build command targets the SDK on the C drive
&amp;gt; west build -b nrf52840dk_nrf52840
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;For a quick library test or a lightweight side project, this is enough. The binding lives only inside the virtual environment and leaves everything else untouched.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: T2 Topology — Let the App Own Its Dependencies
&lt;/h3&gt;

&lt;p&gt;Freestanding relies on your memory or documentation to track the SDK version. Once you start building a real product, that weakness shows. A teammate should be able to &lt;code&gt;git clone&lt;/code&gt; and reproduce the exact build environment — telling them over Slack which &lt;code&gt;ZEPHYR_BASE&lt;/code&gt; to set is an accident waiting to happen.&lt;/p&gt;

&lt;p&gt;T2 is what Zephyr's official docs call the &lt;strong&gt;Star topology (application is the manifest repository)&lt;/strong&gt;. You place a &lt;code&gt;west.yml&lt;/code&gt; manifest directly inside your app repo, making the app itself the owner of its SDK dependencies.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my_product/              # App repo IS the manifest repo (T2)
├── .west/               # Auto-generated by west init
├── app/                 # Application source
│   ├── CMakeLists.txt
│   ├── prj.conf
│   └── src/main.c
├── west.yml             # ★ Pins Zephyr, NRF, and module versions
├── zephyr/              # Pulled by west update
└── modules/             # nrf, hal_nordic, nrfxlib, etc.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Pin a specific Zephyr tag or commit hash in &lt;code&gt;west.yml&lt;/code&gt;, and anyone who runs &lt;code&gt;west init -l . &amp;amp;&amp;amp; west update&lt;/code&gt; anywhere gets the exact same SDK version. The reproducibility that Freestanding lacked is now baked in.&lt;/p&gt;

&lt;p&gt;Zephyr's &lt;a href="https://docs.zephyrproject.org/latest/develop/west/manifest.html" rel="noopener noreferrer"&gt;Manifest Imports&lt;/a&gt; feature simplifies &lt;code&gt;west.yml&lt;/code&gt; authoring considerably. Instead of listing dozens of module versions by hand, you import the Zephyr manifest wholesale and only override what you need.&lt;/p&gt;

&lt;p&gt;The caveat: T2 assumes &lt;strong&gt;one app = one manifest&lt;/strong&gt;. For a single product it is hard to beat, but the moment you need to develop Product A and Product B on the same hardware platform, the model starts cracking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: T3 Topology — Forest Structure for Multiple Apps
&lt;/h3&gt;

&lt;p&gt;The moment arrived when I had to develop "Terminal A" and "Terminal B" on the same platform hardware simultaneously. Creating separate T2 workspaces for each meant duplicating multi-gigabyte SDK folders per product.&lt;/p&gt;

&lt;p&gt;The answer was &lt;strong&gt;T3 Topology&lt;/strong&gt;. Zephyr's docs call it the &lt;strong&gt;Forest topology&lt;/strong&gt; — a dedicated manifest repository arranges multiple apps and the SDK as siblings at the same directory level.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my_workspace/            # Workspace root (T3 anchor)
├── .west/               # West metadata
├── manifest_repo/       # Single repo holding west.yml (dependency hub)
├── app_product_a/       # Application 1
├── app_product_b/       # Application 2
├── zephyr/              # Zephyr core pulled by West
└── modules/             # nrf, hal, nrfxlib, etc.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The decisive difference from T2: &lt;strong&gt;the manifest lives outside any app&lt;/strong&gt;. A single &lt;code&gt;manifest_repo/west.yml&lt;/code&gt; governs the Zephyr version, module versions, and even the Git revisions of every app. &lt;code&gt;app_product_a&lt;/code&gt; and &lt;code&gt;app_product_b&lt;/code&gt; remain fully decoupled, yet at build time they safely share the same verified SDK within &lt;code&gt;my_workspace&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;T3 also pays off when setting up CI/CD pipelines (e.g., GitHub Actions). The CI server clones &lt;code&gt;manifest_repo&lt;/code&gt;, runs &lt;code&gt;west update&lt;/code&gt;, and every app plus its dependencies land at the correct version in one shot.&lt;/p&gt;

&lt;h3&gt;
  
  
  Troubleshooting: The Curse of Windows Long Paths (260-char Limit)
&lt;/h3&gt;

&lt;p&gt;The moment I moved to a West Workspace structure (T2 or T3) and wired up CI, builds started failing:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;"No such file or directory"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;NCS and Zephyr have deeply nested directory hierarchies. When CMake and Ninja generate build artifacts on top of that, the path easily blows past the Windows MAX_PATH limit of 260 characters. Unless you are using third-party tools that handle long paths natively, you will hit this.&lt;/p&gt;

&lt;p&gt;If you are building this structure on Windows, open an admin PowerShell and run this before anything else:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Disable the Windows 10/11 long path restriction (reboot recommended after)
New-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 -PropertyType DWORD -Force
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;I missed this one setting and spent half a day chasing phantom CMakeLists.txt errors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9qi24vn9tz8hgvuux93k.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9qi24vn9tz8hgvuux93k.webp" alt="Freestanding vs T2 Star vs T3 Forest topology comparison — manifest ownership, app count, and SDK sharing differences" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Key differences between the three topology options at a glance&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway: Which Structure Should You Pick?
&lt;/h2&gt;

&lt;p&gt;After ditching the Toolchain Manager's global install, we gained three weapons for NCS project management.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;There is no silver bullet.&lt;/strong&gt; Each structure carries clear trade-offs, and the right choice depends on your project's expected lifetime and team size.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Freestanding&lt;/th&gt;
&lt;th&gt;T2 (Star)&lt;/th&gt;
&lt;th&gt;T3 (Forest)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Manifest&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;App = Manifest&lt;/td&gt;
&lt;td&gt;Separate manifest repo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;App count&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Multiple&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SDK version pinning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual (&lt;code&gt;ZEPHYR_BASE&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Pinned via &lt;code&gt;west.yml&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Pinned via &lt;code&gt;west.yml&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reproducibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Initial setup cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Near zero&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;td&gt;One SDK copy per app&lt;/td&gt;
&lt;td&gt;One shared SDK copy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  1. Freestanding — When You Need Isolation Right Now
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Best for:&lt;/strong&gt; One-off side projects, quick sensor driver tests you plan to throw away, getting a build environment running in under 5 minutes on a personal machine.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Downside:&lt;/strong&gt; &lt;code&gt;west&lt;/code&gt; does not manage versions for you. You have to remember or document which SDK version you depended on, and manually bind the environment variable every time. Three months later — or when handing the project to a teammate — reproduction may fail.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. T2 (Star) — When You Are Building a Real Product
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Best for:&lt;/strong&gt; Serious single-product development where anyone should be able to &lt;code&gt;git clone&lt;/code&gt; and reproduce the exact build environment. A solo developer or small team focused on one firmware project.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Downside:&lt;/strong&gt; The entire SDK lives inside the app workspace, so adding a second product means duplicating the same Zephyr/NRF sources. Storage and &lt;code&gt;west update&lt;/code&gt; time scale linearly with product count.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. T3 (Forest) — When the Team Manages Multiple Products
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Best for:&lt;/strong&gt; Company-scale development with multiple products (A, B, C...) on the same hardware platform sharing common core logic. CI/CD pipeline integration is a must at this stage.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Downside:&lt;/strong&gt; Significant learning curve for the initial &lt;code&gt;manifest_repo/west.yml&lt;/code&gt; setup and directory structure conventions. A manifest maintainer must mediate version conflicts across products.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My own path: I started with Freestanding for rapid prototyping, moved to T2 once the product was greenlit, and switched to T3 when derivative products appeared on the same platform. You do not need to start at T3. Binding the SDK with a single &lt;code&gt;zephyr-env.cmd&lt;/code&gt; call via Freestanding is enough. That alone is the first step toward reclaiming control in the closed NCS ecosystem.&lt;/p&gt;

</description>
      <category>firmware</category>
      <category>ncs</category>
      <category>nrf</category>
      <category>zephyr</category>
    </item>
    <item>
      <title>Zephyr CMake Hell is Dead: Why I Let Google Antigravity Write My Firmware</title>
      <dc:creator>Errata Hunter</dc:creator>
      <pubDate>Wed, 11 Mar 2026 21:41:32 +0000</pubDate>
      <link>https://dev.to/erratahunter/zephyr-cmake-hell-is-dead-why-i-let-google-antigravity-write-my-firmware-2j73</link>
      <guid>https://dev.to/erratahunter/zephyr-cmake-hell-is-dead-why-i-let-google-antigravity-write-my-firmware-2j73</guid>
      <description>&lt;p&gt;I just burned another weekend chasing a ghost I2C timeout that wasn't even in the datasheet. Decided it was time to ditch the proprietary vendor lock-in and migrate to the modern Zephyr RTOS. My reward? Welcome to &lt;code&gt;west&lt;/code&gt; and &lt;code&gt;CMakeLists.txt&lt;/code&gt; hell.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52c93rg0yg4yb5f3kxbd.WEBP" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52c93rg0yg4yb5f3kxbd.WEBP" alt="Technical diagram of the Zephyr RTOS build pipeline illustrating how CMake processes Kconfig and Devicetree inputs to generate headers like autoconf.h and devicetree_generated.h before GCC compilation into zephyr.elf." width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A high-level representation of Zephyr's build pipeline complexity. Before the HN pedants start arguing about exact CMake module dependencies in the comments—yes, this is simplified. The point is that you shouldn't need a PhD in build systems just to toggle a GPIO.&lt;/p&gt;

&lt;p&gt;Bare-metal firmware debugging is painful enough on its own. We shouldn't have to bleed over build system setups too. Web devs have AI agents scaffolding their entire async microservice architectures while we’re still grepping through 2010-era PDFs just to figure out a device tree path.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why I Brought Antigravity into the Embedded World&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;I got sick of it. I decided to throw Google's new agent-centric IDE, Antigravity, at my firmware problems.&lt;/p&gt;

&lt;p&gt;The software world is moving at warp speed with AI, while the embedded industry remains a walled garden of bloated IDEs and slow iteration. We need to adapt. By offloading the massive barrier to entry—Kconfig labyrinths, linker scripts, and toolchain paths—to an AI agent, we can actually focus on what matters: the core logic and hardware interactions. I’m documenting this because we need to lower the barrier to entry in bare-metal engineering. Stop fighting the build system. Get your ideas out there.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Arming Antigravity: Mandatory &amp;amp; Recommended Extensions&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;It’s not some bloated enterprise tool. Setup is dead simple.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Download the OS-specific build from &lt;code&gt;[antigravity.google](https://antigravity.google/download)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; It’s a VS Code fork under the hood, so your muscle memory for shortcuts and UI remains intact.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But since we are forcing an AI agent to write firmware, we have to arm it with traditional weapons for hardware debugging. Go to the Extensions tab and install these:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Mandatory Extensions (The Core)&lt;/strong&gt; &lt;strong&gt;The nRF Extensions Quirk:&lt;/strong&gt; In standard VS Code, you'd just lazy-install the &lt;code&gt;nRF Connect for VS Code Extension Pack&lt;/code&gt; and call it a day. However, Antigravity’s extension search currently doesn't index the consolidated pack. Don't panic. You just have to manually search and install its four horsemen individually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;nRF Connect for VS Code:&lt;/strong&gt; The absolute backbone for SDK, toolchain management, building, and debugging.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;nRF DeviceTree:&lt;/strong&gt; Visualizes and edits the nightmare that is device tree structures.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;nRF Kconfig:&lt;/strong&gt; GUI editor for project settings so you don't go blind reading Kconfig files.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;nRF Terminal:&lt;/strong&gt; Serial and RTT logging terminal.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Recommended Auxiliary Extensions&lt;/strong&gt; To actually read the code the agent spits out and survive debugging, you should install these for development efficiency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Cortex-Debug (marus25.cortex-debug):&lt;/strong&gt; Provides ARM Cortex-M debugging capabilities. &lt;strong&gt;Do not skip this.&lt;/strong&gt; You can't debug bare-metal if you can't dump your hardware registers.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;C/C++ (Microsoft):&lt;/strong&gt; Essential for code compilation, IntelliSense (autocomplete), and debugging support.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;CMake Tools (Microsoft):&lt;/strong&gt; Manages the CMake build system, which is the beating heart of Zephyr.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Antigravity Workflow: Navigating the nRF Connect Sidebar&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Once you have the extensions installed, you'll see the Nordic icon pop up on your left activity bar. Open it up, and you get three beautiful panels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Welcome:&lt;/strong&gt; This is your entry point. You use this panel to manage your SDK versions, set up toolchains, and create new projects from templates.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Application:&lt;/strong&gt; This shows the structure of your loaded Zephyr projects. It’s where you manage not just your source code, but also keep an eye on your device tree overlays and Kconfig (&lt;code&gt;prj.conf&lt;/code&gt;) files.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Build:&lt;/strong&gt; The holy grail. You don't have to constantly wrestle with &lt;code&gt;west build&lt;/code&gt; commands in the terminal anymore. Once your board target is set, you just hit the &lt;strong&gt;"Build"&lt;/strong&gt; and &lt;strong&gt;"Flash"&lt;/strong&gt; buttons right here. Need to open the Kconfig GUI or start a Debug session? One click. It’s a completely frictionless workflow.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What’s Next: NCS and Project Architecture&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The toolchain is ready, and the workflow is set. But before we unleash the AI to actually write our firmware, there’s a multi-gigabyte elephant in the room: installing the &lt;strong&gt;Nordic Connect SDK (NCS)&lt;/strong&gt; and structuring your repository.&lt;/p&gt;

&lt;p&gt;If you dump your code inside the vendor SDK folder ("In-tree") like beginner tutorials suggest, your Git history will become a dumpster fire.&lt;/p&gt;

&lt;p&gt;In Part 2, I’ll walk you through taming the massive NCS installation without breaking your python paths. More importantly, we’ll deep-dive into the architectural holy war of Zephyr project management: &lt;strong&gt;Freestanding Applications vs. T2 Topology (Workspace)&lt;/strong&gt;. I’ll break down their pros, cons, exact use cases, and how to set them up so you don't hate yourself later. Stay tuned.&lt;/p&gt;

</description>
      <category>antigravity</category>
      <category>nrf52840</category>
      <category>zephyrrtos</category>
      <category>firmware</category>
    </item>
  </channel>
</rss>
