Aayush Sharma

Posted on Jan 7

Segmentation and Memory

#architecture #computerscience

The Great Memory Illusion: How Your OS Lies to Your Programs

A deep dive into Segmentation, GDT, LDT, and why modern systems had to move on.

In computers, we have two kinds of memory: ROM (Read-Only Memory) and RAM (Random Access Memory). ROM is what stores BIOS/UEFI firmware used to boot the CPU, while RAM contains the volatile data of processes that we are running on the computer.

One of the major tasks in computer evolution was assigning memory in RAM to the processes we run. Now, the OS creates processes, and its duty is to assign them memory in the RAM.

What is Segmentation?

One of the approaches to assigning memory is segmentation. In segmentation, when a process is created, we divide the process into logical parts like Code, Data, Stack, Heap, etc.

For example:
When a process arrives, it is logically divided into Code (instructions), Data (variables), Stack (function calls), and Heap (dynamic memory), and then stored in the RAM.

One important part that separates this from continuous allocation is that we can store the Code part, Data segment, and Stack in different locations.

But the catch is that the process doesn’t know the address where it is stored. We give the process an illusion. The process sees a logical address space composed of multiple independent segments, each starting at offset zero and bounded by a limit. For example, the Code part sees its space from 0 to limit, and the same applies to others.

When we execute the process, it uses addresses to access the Data segment or Stack values from RAM, but the address it uses is not real. That is why that address is called a Virtual Address. Since we know this address is not real, it is now the duty of the CPU and MMU (Memory Management Unit) to calculate the real address and provide the data to the process.

How Virtual Addressing Works

Let us first understand how the virtual address is given to us by the process. The virtual address is given in the form of:

Virtual Address : A:B

A: The index of the LDT or GDT (Local/Global Descriptor Table, explained shortly).
B: The Offset.

GDT and LDT Explained

GDT: GDT stands for Global Descriptor Table. It is global, so it stores the kernel codes and other system-wide descriptors.

LDT: LDT stands for Local Descriptor Table. When a process is stored in memory for execution, we store that segment’s details (description) in a table, which is the LDT. We store many details about the segment, but here are a few important ones for our understanding:

Base: The address at which the segment starts.
Limit: How long the segment is.
Access Level / Ring Level: The privilege level (e.g., User Level or Kernel Level).
Type of Segment: (e.g., Code or Data).

So, now what we do is store the Selector values in the registers. For example: the Code segment goes in the CS register, Data in DS, the Stack selector in SS (Stack Segment), and so on in other extra registers.

When the CPU needs the info, it simply takes the selector from the register and accesses the details from the LDT.

From Virtual to Physical Address

Now let us understand how we reach from the Virtual Address (also called Logical Address) to the Physical Address.

We take the Selector from the appropriate register and go to the respective LDT. Here, we first perform a check to validate the virtual address. We do that by checking whether the Offset exceeds the Limit. If it exceeds the limit, we know it is an invalid virtual address.

Suppose we have a valid virtual address. We take the BASE address from the LDT and add the Offset. Voila! We get our physical address, and then we access the data we want.

Physical Address = Base + Offset (B)

Deep Dive: The Segment Selector

Let us understand the Segment Selector a little deeper.

The Segment Selector is a 16-bit binary data structure that helps us locate the segment in the GDT or LDT.

Selector Bit Structure:

Bits 0–1 (RPL): Describes the Requested Privilege Level of the selector, which determines whether the selector is valid during permission checks.
Bit 2 (TI): The Table Indicator. If 0, the GDT is used; if 1, the LDT is used.
Bits 3–15 (Index): The Index of the GDT or LDT referenced by the selector.

Context Switching

Now let us understand what happens when another process comes in. Suppose Process A was running, but now it is preempted, and Process B starts executing. Let us see how this context switch happens.

When a context switch happens, the current state of Process A needs to be saved—similar to how we save our progress in a video game. The OS saves the current register selectors in Process A's PCB (Process Control Block).

When we resume Process A again, we load the saved values from the PCB back into the registers and start execution. When Process A is completed, we invalidate/delete the PCB of Process A, free the memory, and invalidate the LDT entries.

Why Segmentation Failed

Now let us understand why this wonderful solution to provide memory eventually failed. One of the main reasons was:

1) External Fragmentation

Let us look at this snapshot of the memory after a few processes have been assigned memory in RAM:

Now suppose Process D's Data segment arrives, and it is around 1 GiB in size. If we look at the RAM, we have more than 1 GiB available total, so we should be able to provide memory. But the issue is that the free space is in broken segments. We don’t have a continuous segment of 1 GiB.

Because of this, we can't assign Process D's Data segment any memory. We have to either kill the process or wait for memory to be freed. This is one of the biggest problems in segmentation: we are unable to utilize the full memory. These small fragments drop efficiency, and we can only execute fewer processes.

2) Growing Segments

In a process, the Heap and Stack are segments that grow. The Heap grows upwards and the Stack grows downwards. One of the potential risks is the Heap and Stack colliding with segments of other processes.

The OS is there to prevent this. Suppose the Heap is stored from location 1000 to 2000, but we need to use more memory and another process’s segment starts at 2001. To fix this, the OS must search memory for a bigger empty space, copy the whole Heap segment to that new place, and update the LDT and other cached registers. This whole copying process is very slow and reduces the efficiency of execution.

Conclusion

There are significant disadvantages to segmentation, which made us drop this memory management technique and move towards a better solution: Paging.

Next time we will talk about paging, but did you know modern Operating Systems still use a segmentation feature that can’t be done using paging? I will tell you about that in the next part.

DEV Community