Discussion on: This is how Meltdown works

View post

Replies for: Thanks, Ed! (Is "Ed" good? I'm embarrassed to admit I can't figure out how to parse your name.) It sounds like you understand this stuff pretty we...

While your analogie is fine to introduce the Meltdown concept, is miss an important part. The distinction between User and Kernel space and virtual memory.

On modern OSes, when you try to access a memory address, this address is a virtual address and has to be translated to a physical address first. The map to convert between virtual to physical address is stored in a dedicated piece of hardware (the TLB which is part of the processor). Each process has its own map.

Every time the active process change on the CPU, the kernel has to flush the TLB to load the new process mapping. Today, as an optimization, all majors OSes choose to copy the kernel mapping into each process at launch so when a process call a kernel function, the kernel don't have to flush the TLB and load its own mapping.
The CPU is design to know which part of the mapping is the kernel memory and which part is the process memory. So when trying to access kernel memory from the process, it denies the access.
As seen with Meltdown, this check is performed to late as the access is denied after the memory was loaded.

The patch adopted by OSes to mitigate the issue is to separate the Kernel memory map from the process map. So when a process try to access kernel memory, the speculative execution failed to map it to physical address (as the kernel map is not present anymore) and return a exception instead of loading the actual kernel memory.

This patch has a performance cost as it force to revert an useful optimization. Fortunately, from some time now, CPUs provide functions to optimize usage of the TLB and avoid flush and reload of the mapping when a process change (by allowing to store more than one process map and tagging them with Context Identifier), so the performance cost should be small enough to be invisible for most users.