beefed.ai

Posted on Apr 26 • Originally published at beefed.ai

Testing, CI, and Validation Strategies for Reliable HALs

#programming

Hardware bring-up stalls when test strategy treats the HAL like ordinary application code. Symptoms you know well: long lab queues, one-off fixes that reappear on new boards, intermittent regressions that vanish when the engineer is watching, and test suites that take days to run. Those failures cost calendar time and credibility — and they’re avoidable when you build a layered validation strategy aligned to the HAL’s unique role as the thin, timing-sensitive translation layer between software intent and silicon behavior.

Contents

Unit vs Integration: Drawing the Boundary Where Bugs Really Live
Emulators, Mocks, and Hardware-in-the-Loop: Practical Patterns That Scale
CI for HAL: Pipelines That Validate Hardware Correctness at Commit Time
Test Metrics, Coverage, and Reliability Gates That Protect Releases
A Practical Test-Harness Framework and Checklist

Unit vs Integration: Drawing the Boundary Where Bugs Really Live

Treat the HAL like a collection of small, observable primitives and you’ll get testability for free. Unit tests should exercise behavior you can observe without real hardware: register-level writes, error handling, buffer management, and boundary conditions. Make those behaviors accessible by factoring hardware access behind small, mockable functions — e.g., hw_read32, hw_write32, delay_us, nvic_enable_irq. Then run the unit tests on your host machine using a lightweight framework like Unity/CMock or CppUTest to get sub-second feedback.

Integration tests validate the interactions that units assume: interrupt ordering, DMA handoffs, peripheral state machines, and endianness/byte-order on concrete targets. Those tests are slower and inherently less deterministic, so place them higher in your testing pyramid and use them to exercise contracts between layers rather than every low-level detail. The test-pyramid principle still applies: favor many fast, focused unit tests and far fewer broad integration runs.

Practical pattern: prefer a three-tier approach for HAL code

Small unit tests that run on host and mock hardware access (fast, deterministic).
In-memory hardware-model integration tests (medium speed): run real driver code against a software model of the device (virtual registers, timing stubs).
Full-system integration/HIL tests (slow): validate timing, analog behavior, electrical edge-cases on real hardware.

Example: A minimal testable UART HAL interface and a unit test sketch.

/* hal_uart.h */
#ifndef HAL_UART_H
#define HAL_UART_H
#include <stdint.h>
typedef int32_t hal_status_t;
hal_status_t hal_uart_init(void);
hal_status_t hal_uart_send(const uint8_t *buf, size_t len);
#endif

/* hal_uart.c -- uses a tiny platform abstraction */
#include "hal_uart.h"
#include "hw_io.h"   // small wrappers: hw_write32(addr, value), hw_read32(addr)

hal_status_t hal_uart_send(const uint8_t *buf, size_t len) {
  for (size_t i = 0; i < len; ++i) {
    while (!(hw_read32(UART_STATUS) & UART_TX_READY)) { /* spin */ }
    hw_write32(UART_TXFIFO, buf[i]);
  }
  return 0;
}

Unit test (host, with mocks generated by CMock):

#include "unity.h"
#include "mock_hw_io.h"   // generated mock for hw_io.h
#include "hal_uart.h"

void test_hal_uart_send_writes_fifo(void) {
  uint8_t data = {0xAA, 0x55};
  // Expect two status reads, then two writes
  hw_read32_ExpectAndReturn(UART_STATUS, UART_TX_READY);
  hw_write32_Expect(UART_TXFIFO, 0xAA);
  hw_read32_ExpectAndReturn(UART_STATUS, UART_TX_READY);
  hw_write32_Expect(UART_TXFIFO, 0x55);

  TEST_ASSERT_EQUAL_INT(0, hal_uart_send(data, 2));
}

Why this works: the HAL becomes a thin layer with observable side effects that you can assert against. Use Ceedling/Unity/CMock and you get automatic mock generation and host execution.

Emulators, Mocks, and Hardware-in-the-Loop: Practical Patterns That Scale

There’s no single answer for emulation vs HIL vs mocking — each tool solves a different problem. Use them together.

Mocks (fakes, stubs): fastest, used in unit tests to isolate your module from neighbors. Good for argument/interaction testing and verifying error paths. See CMock/Unity for C projects.
Emulators/Virtual Platforms (QEMU, Renode, Simics): run unmodified firmware images in a reproducible environment, suitable for integration tests and scripted regression. QEMU supports broad system emulation for many ARM boards and is great for Linux-level bring-up and many firmware images; Renode provides deterministic, multi-node simulation and is designed for embedded system co-development.
Hardware-in-the-loop (HIL): the only tool that exposes analog properties, electrical timing, and real sensor behavior — indispensable for final validation and safety certification in many domains. NI, dSPACE, and Simics-class virtual platforms are commonly used at scale for HIL test farms.

Compare at a glance:

Technique	Strength	Typical use in HAL testing	Drawbacks
Mocking (CMock/fff)	Very fast, deterministic	Unit tests, interaction verification	Misses timing/analog behavior
Virtual platforms (QEMU)	Run unmodified images	Early firmware bring-up, system tests	Incomplete device coverage, board-specific gaps
Simulation frameworks (Renode)	Deterministic, multi-node	Regression of complex node interactions	Requires models for devices
HIL (PXI, LabVIEW, NI VeriStand)	Real analog/electrical fidelity	Final validation, fault injection, certification	Costly, lab scheduling bottleneck

Contrarian insight: push more of your integration testing into deterministic simulation (Renode/QEMU) before scheduling HIL runs. Shorter feedback loops expose regressions earlier and reduce lab queue pressure. Use HIL deliberately for scenarios that require actual analog timing, electrical noise, or certification artifacts.

Practical pattern for device models: prefer an explicit, testable register-model layer that can either (a) be a mock in unit tests, (b) a full software model in Renode for integration runs, or (c) the real hardware in HIL. Reuse the same high-level tests across these three contexts to maximize coverage with minimal duplication.

CI for HAL: Pipelines That Validate Hardware Correctness at Commit Time

A CI pipeline for a HAL needs multiple lanes and hardware-aware orchestration. At minimum, implement these jobs:

Static checks and fast host unit tests (pre-submit): linters, clang-tidy, MISRA/CERT scans, and host-based Unity unit tests to give near-instant feedback. Fails block the PR.
Cross-compiled smoke tests in emulation (post-commit): compile for the target and run the integration tests on Renode/QEMU. Use these to catch ABI/endianness and build-integration issues.
Hardware regression (scheduled or on-demand, using self-hosted runners): push images to the lab, execute HIL scenarios, collect traces and JUnit-style logs.
Nightly long-run soak and regression suite (HIL farm): run power-cycling, fault-injection, long-run throughput tests and store artifacts.

Implement a hardware lock system for shared benches: your job requests a bench lock, flashes the device, runs tests, archives logs, and releases the lock. Keep the bench-control layer versioned in the same repo and expose a small job library that your CI jobs call to standardize lab interaction.

Example skeleton GitHub Actions pipeline (illustrative):

name: HAL CI

on: [push, pull_request]

jobs:
  static-and-unit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install toolchain
        run: sudo apt-get update && sudo apt-get install -y build-essential ...
      - name: Run static analysis
        run: make static-check
      - name: Run host unit tests
        run: make test-host

  emulate:
    runs-on: ubuntu-latest
    needs: static-and-unit
    steps:
      - uses: actions/checkout@v4
      - name: Build target image
        run: make all TARGET=stm32
      - name: Run on Renode
        run: renode -e "s @script.repl"

  hil:
    runs-on: [self-hosted, hil-lab]
    needs: emulate
    steps:
      - uses: actions/checkout@v4
      - name: Flash and run HIL tests
        run: ./tools/bench/flash_and_run.sh build/target.bin --suite=regression

Use self-hosted runners tagged for each lab to control access and capacity. Store results in JUnit XML and persist artifacts (logs, waveform captures, trace files) to your artifact store for post-mortem analysis. GitHub Actions documentation provides the workflow syntax and hosted runner options.

Practical orchestration notes:

Keep the HIL job outside pre-submit for speed; run it on merge or nightly, and gate releases on passing HIL suites for the release branch.
For rapid triage, make emulator jobs run on every PR so the developer sees integration issues before merge.
Implement automatic retries for flaky infrastructure (not for tests): e.g., network or board power faults should be retried, but failing tests should trigger diagnostics before retries.

Secure the lab: isolate bench-control networks, require runner tokens to be short-lived, and audit which job flashed which device and when. Use a simple REST service (bench orchestrator) that offers reserve, flash, run, and collect endpoints; keep it reproducible with containerized simulators for local dev.

Test Metrics, Coverage, and Reliability Gates That Protect Releases

You need signal, not noise. Track a small set of high-signal metrics and enforce pragmatic gates.

Key metrics to record:

Unit test pass rate (per PR) — target: 100% for tests in the PR; any failing unit test should block merge.
Cross-target build success rate (per commit) — ensures ABI/toolchain problems are caught.
Integration/HIL pass rate (per nightly run) — used for release gating and trend analysis.
Test flakiness rate — fraction of tests that produce non-deterministic outcomes over a rolling window. Google’s experience shows flakiness is a real, large-scale problem and needs active management.
Coverage (statement/branch/MC/DC) — use policy-based thresholds. For general firmware, require a minimum statement/branch target per module; for safety-critical modules, require standards-driven coverage (MC/DC for the highest integrity levels). Tooling vendors and safety guidance (ISO 26262 / DO-178C) prescribe structural coverage metrics for certification — plan for MC/DC where the standard or your domain demands it.

A practical gate table (example):

Gate	When enforced	Metric	Action on failure
Pre-merge	On PR	static checks + host unit tests	Block merge
Post-merge	On main branch	emulator integration suite	Raise alert; block release if regression persists
Release	Before release build	HIL acceptance suite + coverage thresholds	Fail release candidate
Nightly	Daily	Long-run soak + flakiness trend	Auto-open triage ticket if trend exceeds threshold

Flakiness handling — a guarded approach:

Retry failing tests automatically once (infrastructure faults only).
If failures persist, run diagnostics (collect logs, re-run on different bench, run narrowed tests).
Quarantine the test if it exhibits flaky behavior across environments and create a remediation ticket. But don’t blind-quarantine every flaky test: a study on Chromium CI shows that flaky tests can reveal regressions; ignoring them wholesale masks faults. Triage flakiness with root-cause analysis rather than blanket suppression.

Coverage expectations by domain:

Non-safety consumer firmware: aim for 60–85% unit coverage, with focused integration tests for complex state machines.
Automotive/medical/avionics safety-critical components: follow the relevant standard — ISO 26262 and DO-178C require structural coverage analysis (statement/branch/MC/DC) for high ASIL/DAL levels. Plan tooling to produce traceability between requirements, tests, and coverage artifacts.

Instrument your CI to publish these metrics (Grafana dashboards, annotated PR statuses) so the team sees trends, not just pass/fail noise.

Important: A passing HIL suite is necessary but not sufficient; your CI artifacts (traces, logs, coverage reports) must be archived and linked to each release for forensic analysis and certification evidence.

A Practical Test-Harness Framework and Checklist

Below is a portable test-harness architecture and a step-by-step checklist you can adopt immediately.

Test-harness architecture (components)

Platform abstraction layer: small, testable functions (hw_read32, hw_write32, power_control, reset) implemented as link-time pluggable modules.
Unit test harness: host-executable harness (Unity/CMock) + coverage instrumentation.
Emulation runner: scripts to boot firmware in Renode/QEMU, collect logs, and convert output to JUnit XML.
Bench orchestrator: REST service to reserve benches, flash firmware, run scenarios, capture traces, and release resources.
Result collector: stores logs, waveform captures, and coverage reports; exposes search and diff tools for regression triage.

Minimal test-harness API (header-sketch)

/* test_harness.h */
int harness_reserve_device(const char *board_tag, int timeout_s);
int harness_flash_image(const char *device_id, const char *image_path);
int harness_run_test(const char *device_id, const char *suite_name, const char *output_junit);
int harness_release_device(const char *device_id);

Step-by-step protocol to add a platform to CI

Factor hardware access behind small functions in the HAL (register access, clock control, reset).
Write host-unit tests for pure logic (use Unity/CMock). Ensure they run on your laptop and in CI.
Add a software register-model for the device and run the same integration tests under Renode/QEMU to catch system-level issues early.
Implement a bench-orchestrator job to flash and run the HIL scenario; add a lab-run job that runs on self-hosted runners and archives artifacts.
Define reliability gates (unit pass, emulator pass) and enforce HIL acceptance for release branches.
Track metrics (coverage, flakiness, MTTD/MTTR) and enforce triage SLAs when thresholds are exceeded.

Practical checklist (copy into your project README)

[ ] HAL surface is small and mockable (hw_* primitives).
[ ] Unit tests for every error path; run on host and in CI.
[ ] Integration tests run reproducibly in Renode/QEMU and are triggered on merge.
[ ] HIL test suites defined, scripted, and runnable via bench orchestrator.
[ ] Coverage reports and JUnit XML are generated and archived for every pipeline run.
[ ] Flaky-test dashboard exists; flaky tests have triage tickets and quarantine policy.

Sample small test-runner snippet (Python) to flash and collect JUnit:

# tools/bench/flash_and_run.py
import subprocess, sys, requests, os

def flash(device, image):
    # openocd or vendor flasher
    subprocess.run(["openocd", "-f", "board.cfg", "-c", f"program {image} verify reset; exit"], check=True)

def run(device, suite):
    r = requests.post(f"http://lab-orchestrator/run", json={"device": device, "suite": suite})
    return r.json()["result_url"]

if __name__ == '__main__':
    device = sys.argv
    image = sys.argv
    suite = sys.argv
    flash(device, image)
    print(run(device, suite))

Operational example: a nightly job reserves five benches, runs a matrix of temperature/voltage/fault-injection scenarios, stores traces, and posts a summary report to the release board. Use artifact retention for at least the life of the sprint (or longer for certified builds).

Sources:
Throw The Switch — Unity, CMock, Ceedling - Unit testing and mock generation tools commonly used in embedded C, used here for the Unity/CMock pattern and mock-based unit testing examples.

The Test Pyramid — Martin Fowler - Conceptual guidance on test-layer balance (unit vs integration vs end-to-end) used to justify test-layer distribution.

Renode — Antmicro - Deterministic embedded system simulation framework recommended for reproducible integration testing and multi-node scenarios.

QEMU System Emulation Documentation - System-level emulation for running unmodified firmware images and early platform bring-up.

GitHub Actions documentation — Continuous integration - Example workflow syntax and hosted/self-hosted runner model referenced for CI design and pipeline examples.

Flaky Tests at Google and How We Mitigate Them — Google Testing Blog - Empirical evidence on test flakiness prevalence and mitigation strategies.

How to Use Simulink for ISO 26262 Projects — MathWorks - Guidance on structural coverage expectations (statement/branch/MC/DC) for functional safety which informs coverage gating.

Hardware-in-the-Loop (HIL) Testing — National Instruments - Industrial HIL architecture and examples used to justify HIL for electrical/analog fidelity.

Wind River Simics — Virtual platform simulation for embedded systems - Virtual platform and full-system simulation capability referenced as an industry-grade virtual-platform option.

IAR Embedded — Embedded CI/CD tools and guidance - Embedded CI/CD patterns for cross-compilation, toolchain integration, and scaled testing (used for pipeline architecture signals).

ISO 26262 Structural Coverage Discussion — Rapita Systems - Practical mapping of coverage metrics to ASIL levels and verification activities used to justify MC/DC planning.

The Importance of Discerning Flaky from Fault-triggering Test Failures — Chromium CI study - Evidence that flaky tests can still reveal real faults and the danger of over-suppressing flakiness.

Put the scaffolding in place, then protect it with disciplined CI and metric-driven gates: small, mockable primitives; host-executable unit suites; deterministic emulation; and scheduled HIL runs. The work upfront shortens bring-up from weeks to days, reduces lab contention, and makes regressions traceable — those are the returns that pay back on every new board.