Powered by IIES Institute Bangalore
Motor controller. Production hardware. The system would run perfectly for hours, then suddenly freeze — no crash, no fault handler, just... silence. The watchdog would reset everything and we'd be back in business. For about six hours.
It took me two days to figure out it was a priority inversion. A low-priority logging task was holding a mutex that a high-priority motor task needed. A medium-priority CAN handler kept preempting the logger. The motor task starved. The watchdog fired.
I'd been writing RTOS code for a year at that point. I thought I understood scheduling. I did not. This post is everything I learned the hard way — written for you, so you don't have to spend two days staring at a logic analyzer.
<10µs
Context switch on Cortex-M4 @ 168MHzO(1)
Task selection — single CLZ instruction69.3%
Max CPU utilization under RMA (n→∞)
First, Let's Kill a Myth
The most common thing I see new embedded engineers get wrong: they think an RTOS makes their system faster. It doesn't. It makes timing predictable. That's a completely different thing, and it's the whole point.
On a bare-metal superloop, everything runs in sequence. Your 100ms display refresh sits right next to your 1ms motor update. You're always one long function away from missing a deadline. You can go fast — but you can't go on time.
An RTOS gives you a contract: 'A task with sufficient priority will preempt whatever is running within a bounded window.' That bound is what makes real-time systems work. Not speed. Predictability.
Think of the RTOS scheduler as a function that runs at every tick and after every blocking call, answering one question: 'Which task should own the CPU right now?' In a fixed-priority system, the answer is always: 'the highest-priority task that is Ready.' Everything else is just implementation detail.
The moment you need to service a CAN interrupt, update motor PWM, and refresh a display — all with distinct timing requirements — you need a scheduler. That's it. If your timing requirements are all the same, a superloop is fine. The second they diverge, you need this.
What the Scheduler Actually Does
Most tutorials show you a pretty diagram of task states and then show you xTaskCreate(). That's skipping the good part. Let me show you the actual scheduler code — FreeRTOS, stripped to its bones.
tasks.c — vTaskSwitchContext (simplified)
C
/* This is the entire scheduler. Seriously. */
void vTaskSwitchContext( void )
{
UBaseType_t uxTopPriority;
/* Find the highest-priority bit set in the ready-list bitmap */
portGET_HIGHEST_PRIORITY( uxTopPriority, uxTopReadyPriority );
/* Pick the first task at that priority level */
listGET_OWNER_OF_NEXT_ENTRY( pxCurrentTCB,
&( pxReadyTasksLists[ uxTopPriority ] ) );
}
/* On ARM Cortex-M, portGET_HIGHEST_PRIORITY expands to:
uxTopPriority = ( 31UL - __clz( uxReadyPriorities ) );
That's it. One CLZ instruction. O(1). No matter how many tasks.
This is why RTOS context switches are so fast. */
Task States — Draw This on Paper
Seriously, take five minutes and draw the task state machine on paper. More RTOS bugs come from not having this mental model clearly loaded than from any API misuse. Here it is:
The thing people get wrong: they think 'BLOCKED' means the task is stuck. It's not stuck — it's efficiently parked. A blocked task consumes zero CPU. It's sitting in a list, waiting for something specific. The scheduler doesn't even look at it until that event fires.
This is why RTOS-based systems can run dozens of tasks on a Cortex-M4 with sub-millisecond response times and still have 90% CPU headroom. Most tasks are blocked most of the time. The scheduler only runs tasks that have something to do.
The Tick Rate Trap (I've Seen This a Dozen Times)
vTaskDelay(1) does not delay for 1 millisecond. It delays for 1 tick. If your tick rate is 100Hz, that's 10ms. Always use pdMS_TO_TICKS(1) and always know what your tick rate is configured to. I once inherited a codebase where someone had set configTICK_RATE_HZ to 10 — every vTaskDelay(100) was actually a 10-second delay. Fun to debug.
Context Switching in Assembly
This is the part most tutorials skip. A context switch isn't magic — it's assembly code that saves every CPU register from the current task onto its stack, calls the scheduler to pick the next task, then restores every register from the new task's stack. That's it.
On ARM Cortex-M, the hardware helps you out. When an interrupt fires (including the PendSV interrupt the RTOS uses for scheduling), the CPU automatically pushes 8 registers onto the current stack before jumping to your ISR. The RTOS just has to handle the rest.
port.c — PendSV_Handler, ARM Cortex-M4F (annotated)
ASM
PendSV_Handler:
; Hardware already saved: xPSR, PC, LR, R12, R3, R2, R1, R0
; Those 8 are free. Now we save the rest.
MRS R0, PSP ; Get current task's Process Stack Pointer
LDR R3, =pxCurrentTCB ; Load address of current TCB pointer
LDR R2, [R3] ; R2 = current TCB
VSTMDB R0!, {S16-S31} ; Save FPU registers (if task used FPU)
STMDB R0!, {R4-R11, R14} ; Save R4-R11 + EXC_RETURN value
STR R0, [R2] ; Save updated stack pointer back into TCB
; ↑ Current task is now fully frozen. Stack holds its entire world.
BL vTaskSwitchContext ; Pick the next task (updates pxCurrentTCB)
; ↓ Restore the new task. It was frozen exactly like this at some point.
LDR R1, [R3] ; R1 = new TCB (pxCurrentTCB was updated)
LDR R0, [R1] ; R0 = new task's saved stack pointer
LDMIA R0!, {R4-R11, R14} ; Restore R4-R11 + EXC_RETURN
VLDMIA R0!, {S16-S31} ; Restore FPU registers
MSR PSP, R0 ; Update Process Stack Pointer
BX R14 ; Return from exception.
; Hardware restores the other 8 registers.
; CPU is now running the new task as if
; it was never interrupted. Magic. (It's not magic.)
The Task Control Block (TCB) is a struct whose first member is always the saved stack pointer. That's not an accident — it means the assembly above can find it at offset zero without needing to know anything else about the struct layout. Clean engineering.
Stack Sizing Is Not a Guess
Each task needs enough stack for its own local variables, its call chain, AND the full context save shown above (17 core registers + 16 FPU registers = 33 words = 132 bytes just for the context frame, before your code does anything). Undersize the stack and you get silent memory corruption — the overflow writes into whatever is adjacent in RAM. Enable configCHECK_FOR_STACK_OVERFLOW 2 in every development build. Every one.
Priority Inversion — Back to My Bug
Remember the motor controller bug from the intro? Let me walk you through exactly what happened — because understanding this scenario saves careers.
In 1997, the Mars Pathfinder landed on Mars. Within a few days, the system started resetting itself. Telemetry showed a watchdog timeout. NASA engineers pored over the data, running the same software on Earth, trying to reproduce it. Eventually they found it: a priority inversion between a low-priority meteorological task holding an information bus mutex, a medium-priority communications task preempting it repeatedly, and a high-priority bus manager task starving as a result. The fix — enabling priority inheritance in VxWorks, a single config flag — was uploaded to a spacecraft 190 million kilometres away. It worked.
Here's the exact scenario. Three tasks, one mutex:
The scheduler isn't broken. It's doing exactly what you told it to do. HIGH is blocked (legitimately waiting on a mutex). MEDIUM is ready. So MEDIUM runs. The scheduler cannot know that MEDIUM's execution is indirectly preventing HIGH from getting what it needs.
The fix is priority inheritance: when HIGH blocks on a mutex held by LOW, temporarily raise LOW's priority to match HIGH's. Now MEDIUM can't preempt LOW. LOW finishes, releases the mutex, its priority drops back to normal, and HIGH gets what it was waiting for.
The fix — one word difference, massive impact
C
/* ❌ WRONG — binary semaphore has NO priority inheritance */
xMutex = xSemaphoreCreateBinary();
/* ✅ CORRECT — mutex implements priority inheritance automatically */
xMutex = xSemaphoreCreateMutex();
/* That's it. That's the entire fix.
When a high-priority task blocks on this mutex, FreeRTOS
automatically boosts the holding task's priority.
No code changes anywhere else required. */
/* The general rule: if you're protecting a shared resource
(SPI bus, I2C peripheral, buffer, state) — use a MUTEX.
Binary semaphores are for signalling, not resource protection. */
Scheduling Algorithms — Your Actual Options
Most embedded RTOS implementations give you Fixed-Priority Preemptive Scheduling. Tasks have static priorities. Highest-priority ready task runs. Higher-priority tasks preempt lower ones immediately. Clean. Simple. Auditable. Use it.
There's a theorem behind priority assignment called Rate Monotonic Analysis (RMA): assign higher priority to tasks with shorter periods. A task running every 1ms gets a higher priority than one running every 10ms. This is provably optimal — if any fixed-priority assignment can meet all deadlines, the rate-monotonic assignment will too.
📐 The Utilization Bound Formula (Worth Memorising)
For n tasks, the system is provably schedulable if:
U = Σ (C_i / T_i) ≤ n(2^(1/n) − 1)
Where C_i = worst-case execution time, T_i = period. As n grows, this approaches ln(2) ≈ 69.3%. If your total utilization is under ~70%, you're almost certainly fine. Over 70%, you need to run Response Time Analysis to be sure.
| Property | Cooperative | Preemptive |
|---|---|---|
| Context switches when | Task explicitly yields | Any tick or ISR return |
| Interrupt latency | Unbounded | Bounded (≤1 tick) |
| Race conditions | Fewer — natural protection | Must use mutexes |
| Hard real-time | No | Yes |
| Debug difficulty | Easier | Timing-dependent bugs |
| When to use | Tiny MCUs, all tasks same priority | Anything with mixed timing requirements |
In practice, use preemptive. Cooperative scheduling is a useful teaching tool and occasionally appropriate for deeply resource-constrained parts, but if you're on a Cortex-M with an RTOS, you want preemption. You're not writing an Arduino sketch.
Patterns That Actually Work in Production
1. Keep ISRs Stupid Short
An ISR that does actual work is a bug waiting to happen. You're running at interrupt priority — you can't use most FreeRTOS APIs, you can block the scheduler, and you're eating into every other interrupt's latency. Post to a queue and return. Let a task do the work.
The deferred interrupt pattern — the right way to handle hardware events
C
/* ISR: as short as humanly possible */
void USART1_IRQHandler( void )
{
BaseType_t xHigherPriorityTaskWoken = pdFALSE;
uint8_t byte = USART_ReceiveData( USART1 );
xQueueSendFromISR( xRXQueue, &byte, &xHigherPriorityTaskWoken );
/* If posting unblocked a higher-priority task, trigger a context
switch before this ISR returns. No delay. Immediate handoff. */
portYIELD_FROM_ISR( xHigherPriorityTaskWoken );
}
/* Task: does the actual work at task priority */
void vUARTTask( void *pvParams )
{
uint8_t byte;
for(;;) {
/* Zero CPU usage while nothing arrives */
if( xQueueReceive( xRXQueue, &byte, pdMS_TO_TICKS(100) ) )
vProcessByte( byte );
}
}
2.Use vTaskDelayUntil — Not vTaskDelay
This one bites people constantly. vTaskDelay() starts counting from when the task wakes up. So if your task body takes 2ms and you delay for 10ms, your actual period is 12ms. And it drifts. Use vTaskDelayUntil() for anything periodic — it measures from the last wake time, so execution time doesn't accumulate into your period.
periodic_task.c — do it this way, every time
C
void vMotorControlTask( void *pvParams )
{
TickType_t xLastWake = xTaskGetTickCount();
const TickType_t xPeriod = pdMS_TO_TICKS( 1 ); /* 1ms hard */
for(;;) {
vTaskDelayUntil( &xLastWake, xPeriod );
/* Blocks until (xLastWake + xPeriod), then updates xLastWake.
Even if you ran long last iteration, the next wake time
is still correct. No drift. */
vReadEncoders();
vRunPIDLoop();
vSetPWMOutputs();
}
}
3.Enable Stack Overflow Detection — Always
FreeRTOSConfig.h + hooks.c
C
/* FreeRTOSConfig.h — method 2 fills stack with 0xA5 pattern
and checks it on every context switch. Catches it early. */
#define configCHECK_FOR_STACK_OVERFLOW 2
/* hooks.c */
void vApplicationStackOverflowHook( TaskHandle_t xTask, char *pcName )
{
/* Stack is corrupt — do NOT call any FreeRTOS API here */
taskDISABLE_INTERRUPTS();
/* Hang. Let the debugger catch it or watchdog reset.
pcName will tell you which task overflowed.
Then go back and double its stack size. */
for(;;);
}
/* After a while, use this to check headroom in normal operation: */
UBaseType_t remaining = uxTaskGetStackHighWaterMark( xMyTask );
/* If this is under ~50 words, size up. */
Mistakes I've Seen (And Made)
No judgment here. These are real mistakes from real systems, some of them mine.
| The Mistake | What You'll See | The Fix |
|---|---|---|
| Calling blocking API from ISR | Hard fault, immediate crash, watchdog reset | Use xQueueSendFromISR() and friends |
| Binary semaphore as mutex | Intermittent timing violations, priority inversion | xSemaphoreCreateMutex() — always |
| vTaskDelay() for periodic tasks | Gradual period drift, cumulative jitter | vTaskDelayUntil() — no exceptions |
| Stack too small | Corrupted globals, random crashes, hours of debugging | Enable overflow check, use watermark API |
| Acquiring mutexes in different orders | Deadlock. Full stop. System hangs forever. | Global mutex acquisition order. Document it. Enforce it. |
| Task that never blocks | Everything starves. System appears frozen at lower priorities | Every task must call a blocking API somewhere in its loop |
| Using FreeRTOS API before vTaskStartScheduler() | Silent corruption, crash on first switch | Initialise hardware in main, start everything in tasks |
Conclusion
Embedded systems engineering in India has matured fast. The engineers writing firmware for automotive ECUs, medical devices, and industrial controllers across Bangalore, Pune, and Hyderabad are dealing with exactly these problems — priority inversion on a CAN bus, stack overflows at 3am, a watchdog nobody can explain.
Institutes like the Indian Institute of Embedded Systems (IIES) exist because this gap between knowing C and understanding what the kernel is actually doing underneath is real, and it costs production hours.
But no course closes that gap alone. The concepts in this post — task states, context switching, priority inheritance — only become instinct after you've broken something in production and had to find it.
Read the theory. Then go write a task, starve it on purpose, and debug it yourself. That's the part nobody can teach you.



Top comments (0)