Stan Marsh

Posted on Apr 15

5 Common Mistakes in Embedded C Firmware (And How to Actually Fix Them)

#iot #firmware #embeddedc #embedded

Firmware bugs don't crash gracefully. They freeze your device at 3 AM in production, corrupt sensor data silently, or drain a battery in hours instead of months. Here are 5 mistakes I see repeatedly, along with what to do instead.

01 Ignoring volatile on hardware-mapped variables

This one bites engineers who are solid in application C but new to bare-metal. The compiler doesn't know your hardware peripheral can change a register behind its back, so it happily caches the value in a CPU register and never re-reads memory.

Bad
uint8_t status_reg = (uint8_t *)0x40001000;
while (*status_reg == 0) {} // optimizer may hoist this out of the loop
**Fixed*
volatile uint8_t *status_reg = (volatile uint8_t *)0x40001000;
while (*status_reg == 0) {} // re-reads on every iteration

Always qualify hardware register pointers, ISR-shared variables, and DMA buffers with volatile. It's not just a good habit in O2 optimization; omitting it can produce silent, incorrect behavior.

Pro tip: Use a static analysis tool like PC-lint or Cppcheck to flag missing volatile qualifiers automatically.

02 Stack overflows with no detection in place

Embedded systems often have kilobytes — not megabytes of RAM. Recursion, large local arrays, or deep call chains can quietly overflow the stack into the heap or BSS. The symptom? Random, unreproducible crashes that only appear on certain inputs.

Risky
void process_packet(uint8_t buf) {
uint8_t temp[512]; // local buffer eating half your stack
memcpy(temp, buf, 512);
...
}
**Better*
static uint8_t temp[512]; // static: placed in BSS, not stack

void process_packet(uint8_t *buf) {
memcpy(temp, buf, 512);
...
}

Paint your stack memory with a canary pattern at boot (e.g., 0xDEADBEEF). Check in your watchdog ISR. Many RTOSes like FreeRTOS have built-in stack high-water marks — enable them during development.

Pro tip: Enable fstack-usage in GCC to get per-function stack consumption reports at compile time.

03 Blocking delays inside interrupt service routines

An ISR must be fast. Putting HAL_Delay() or a polling loop inside an ISR is one of the most common beginner mistakes, and one of the hardest to debug because everything looks fine until it doesn't.

Bad
void EXTI0_IRQHandler(void) {
HAL_Delay(50); // NEVER do this — blocks the CPU, may miss other IRQs
process_button_press();
HAL_GPIO_EXTI_IRQHandler(GPIO_PIN_0);
}
Better
volatile uint8_t button_flag = 0;

void EXTI0_IRQHandler(void) {
button_flag = 1; // set a flag, do real work in main loop
HAL_GPIO_EXTI_IRQHandler(GPIO_PIN_0);
}

Keep ISRs to: set a flag, push to a queue, clear the interrupt source. Do the heavy processing in your main loop or a task. This is the "deferred processing" pattern, and it will save you from a lot of hard-to-reproduce bugs.

04 Signed/unsigned mismatches in peripheral math

C's implicit type promotion rules are counterintuitive. Mixing int8_t with uint16_t in arithmetic without explicit casts leads to sign-extension bugs that appear only on specific sensor readings or ADC values — which makes them feel like hardware issues, not software bugs.

Subtle bug
int8_t offset = -10;
uint16_t raw_adc = 300;
uint16_t result = raw_adc + offset; // offset sign-extends to 0xFFF6 — wraps!
Explicit
int16_t result = (int16_t)raw_adc + (int16_t)offset; // safe: 290

Enable -Wsign-conversion and -Wconversion in your compiler flags. They're noisy at first but surface exactly these hidden issues before they reach hardware.

05 No watchdog, or a watchdog that's pet inside an infinite loop

A watchdog timer is your last line of defense against a firmware lockup in production. But many teams either skip it ("we'll add it before release") or pet it unconditionally in main(), which defeats its entire purpose.

Useless watchdog
while(1) {
HAL_IWDG_Refresh(&hiwdg); // always refreshed — even if tasks are frozen
run_tasks();
}
Meaningful watchdog
uint8_t task_checkin = 0;

void watchdog_check(void) {
if (task_checkin == EXPECTED_MASK) {
HAL_IWDG_Refresh(&hiwdg); // only refresh if ALL tasks ran
task_checkin = 0;
}
// else: system resets on next IWDG timeout
}

Each task sets its bit in task_checkin after completing. The watchdog only resets if all tasks have checked in. This way, a hung task actually triggers the watchdog and recovers the system.

Pro tip: Log the reset cause in non-volatile memory on boot. If you see unexpected watchdog resets in the field, you'll have a breadcrumb trail.

These mistakes aren't signs of a bad engineer — they're signs of a discipline with genuinely sharp edges. Embedded C gives you enormous power and almost no safety net.

If your team is scaling firmware development and needs more structured review processes, tooling choices, or architecture guidance, partnering with an experienced Embedded Software Development Company can help you build reviewable, maintainable, and testable firmware from the ground up — not as an afterthought.

Which of these have you hit in your own projects? Drop a comment — especially if you have a creative workaround I haven't covered.

Top comments (4)

mote • Apr 17

The binding problem you describe is essentially the same challenge we ran into when building memory systems for embodied AI (robots/drones that need to remember things across missions). Your finding that "implement [target]" is vacuous really resonates.

What worked for us was moving from skill abstraction to episodic context binding — instead of storing "how to do X", we store structured episodes: {context: "loaded 2GB CSV into staging with encoding errors", outcome: "used pandas error_handlers='ignore', lost 3% rows but pipeline completed", trigger_pattern: "large_file + schema_mismatch"}. The key insight was that the failure mode is more valuable than the success pattern. When the agent encounters a similar context fingerprint, it recalls the episode with its failure metadata attached, not a sanitized "best practice."

One thing I'm curious about: did you experiment with confidence-gated recall? We found that forcing recall on every task actually hurt performance more than having no memory at all. A simple threshold (only recall when cosine similarity > 0.82 for context embedding) cut the noise dramatically. The agent voluntarily ignoring memory turned out to be a feature, not a bug.

Stan Marsh • Apr 21

This is a genuinely fascinating parallel — I hadn't framed it as a "binding problem" explicitly, but that's exactly what it is. The gap between storing a skill abstraction and knowing when and whether to invoke it is where most memory systems quietly fall apart.
Your episodic context binding approach makes a lot of sense. The structured episode format, especially anchoring on failure metadata rather than sanitized success patterns, is a meaningful shift. It mirrors how humans actually encode useful memories: not "pandas can handle encoding errors" but "I was in this exact mess, tried this, lost 3% of data, and the pipeline survived." The failure context is load-bearing information that abstract best practices strip out.
To your question about confidence-gated recall: I didn't implement a formal threshold during the experiments, but I observed the same pathology informally. When recall was forced on every task, the agent would surface loosely related episodes and anchor on them even when counterproductive, a kind of false familiarity. It performed worse than baseline in those cases.
The cosine similarity threshold at 0.82 is interesting, did you arrive at that empirically across a range of tasks, or is it domain-specific to your robotics/drone context? I'm wondering whether the right threshold is stable across domains or needs calibration per task-type. There's also an interesting question of what you're embedding — the raw context string, a structured feature vector, or both?
The framing of "the agent voluntarily ignoring memory as a feature" is something I want to steal. It reframes the design goal cleanly: the memory system shouldn't be an obligation the agent fulfills, it should be a resource it consults when confidence justifies it. That's a much healthier mental model for building this out.

mote • Apr 21

Solid list. The volatile one is probably the most insidious in practice because it's a correctness issue that only shows up at higher optimization levels — which is exactly when you're doing a release build for production, not during debug.

The stack overflow point is worth expanding: the real trap I've seen is DMA buffers placed on the stack combined with cache alignment requirements. You end up with a 64-byte aligned 512-byte buffer on the stack and a cache flush, and you've just silently corrupted the stack in ways that manifest as random ISR failures days later.

One thing I'd add to the list: blocking inside ISRs. It sounds obvious, but I've watched experienced firmware engineers do it with FreeRTOS mutexes because the API signature looks synchronous. The result is a priority inversion that only triggers under specific load conditions.

On the Rust side — the ownership model makes volatile issues nearly impossible (you can't have two mutable references to a register), and the type system catches a lot of the stack size problems at compile time. Have you worked on projects where the legacy C firmware complexity made a Rust rewrite tempting, or does the toolchain ecosystem still feel too immature for the target hardware?

Stan Marsh • Apr 22

The DMA + cache alignment point is a great catch; it's three individually understood concepts combining into something catastrophic, and the delayed ISR symptoms make it nearly impossible to attribute without already knowing to look for it.
On FreeRTOS mutexes in ISRs: the API is the trap. xSemaphoreTake() looks synchronous, nothing in the signature signals ISR context is wrong. The correct call is xSemaphoreTakeFromISR() — easy to miss under deadline pressure.
On Rust: genuinely tempted, especially for greenfield Cortex-M projects. The ownership model making volatile aliasing a compile-time error is hard to argue with. The real friction on legacy rewrites isn't the language, it's vendor HAL dependencies. If your MCU vendor doesn't ship Rust bindings (most don't, outside Nordic), you're writing FFI wrappers or reimplementing drivers from scratch.
Honest take: Rust is production-ready for embedded. The ecosystem question is really a "which MCU" question now, not a general maturity problem.