This blog is part of a series.
We compiled ourselves a binary. How does it get loaded and run? And what does it do while running? Let's start with an example, in the previous post we ran:
[lostghost1@archlinux c]$ ./main
Hello!
What exactly happens when we run ./main
?
An OS process gets created and started. A process being an independent execution, that has it's own context and resources, such as file descriptors and memory.
Firstly, it is important to understand, that a process is never created for no reason - it is always started by another process. In this case, the chain of who started whom goes ./main
<-bash
<-xfce4-terminal
<-xfce4
<-X11
<-sx
<-bash
<-login
<-systemd
<-init (in initramfs)
<-systemd-boot
<-UEFI Firmware
.
Next, a process is never started from "nothing" - it can only start within an already running process, with the exec system call. But you can't have more than one process this way - this is resolved by the clone syscall.
clone
copies the process - and all of it's memory. But not quite - a lot of the address space of the process is shared libraries, which don't need to be copied. And those mappings that do need to be copied, for example, program stack and heap - are initially copied with Copy-on-Write, CoW. This way, only minimal physical memory is needed for the running program, and if the process is to later be replaced by another, with exec
- no extra effort was wasted.
So, when launching a program - first clone
is performed, then exec
. exec
removes the current process memory map, mapping the segments of the target ELF executable process image instead. After that, a jump to ENTRY symbol (usually _start
) is performed.
Speaking of mappings - they are performed by the mmap syscall (or the internal kernel version of it, when the kernel makes mappings implicitly, during the execution of exec
). While usually programs create anonymous mappings for the heap, or they are automatically created by hitting the guard page for the stack, this time it's an actual file mapping - and both portions of the file being mapped, and the memory they are mapped to, need to align at page boundaries. If the offset of a segment within a file is not aligned, the segment is mapped with a portion of the segment next to it, if it doesn't fill a page exactly - it is zero-extended.
This is how a statically-linked program is launched. What about a dynamically linked one? It requires an interpreter to be launched first. So, for our main
:
[lostghost1@archlinux c]$ readelf -a ./main
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x0000000000000310 0x0000000000000310 R 0x8
INTERP 0x00000000000003b4 0x00000000000003b4 0x00000000000003b4
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
Notice the last line - this means, that the program /lib64/ld-linux-x86-64.so.2
is launched instead of our program. Typically this is a dynamic loader - but it can be any program, really!
The linker that runs at compile time assigns addresses wherever they are needed, exports symbols, and creates relocations. Relocations are needed because libraries should be loadable at any address - so calls to functions in them should be indirect. So what you get is a GOT - a table that gets modified at runtime, which points to the actual locations of the functions. Think of it this way - a library exports a table of function addresses, but not as direct addresses - because it doesn't know where the library will be loaded - but as offsets from the "base" address instead. And when the library does get loaded, and the "base" address is established - you only need to add offsets to that base address, to know where in memory the functions are stored. This is a simplification, but I feel it is a helpful one.
So, when a dynamically linked program is started, the dynamic linker/loader is started first, which looks at the headers of the executable, and finds the list of dynamic libraries that are needed. Then it looks for those libraries - the rules for where to look are listed in /etc/ld.so.conf
. There is a cache for library locations - /etc/ld.so.cache
. And finally, the path where to look can be overriden entirely, with rpath. Those libraries themselves may depend on other libraries - the search is recursive.
There is one more thing to keep in mind with dynamic libraries - versioning. By convention, libraries have 3 version numbers - breaking ABI release, backwards-compatible ABI release, internal change. So the file on disk is libfoo.so.X.Y.Z
. But the program using the library doesn't care about Y and Z - it just needs a compatible X version. So in the program header, it requests libfoo.so.X
but then - how does a dynamic linker/loader match the requested libfoo.so.X
against the actually existing libfoo.so.X.Y.Z
? By the use of symlinks:
libfoo.so -> libfoo.so.X -> libfoo.so.X.Y -> libfoo.so.X.Y.Z
These symlinks are created by ldconfig
. Typically it is run after any system upgrade.
Why do we need the symlinks, why not just have libfoo.so
on disk? Well, if some programs need different X versions of the same library. In practice, the package manager tracks all of these version numbers, so you never have that situation - but this provides a mechanism for if you do. Ok then, why not have libfoo.so.X
on disk, and not the Y and Z? That's for if you are upgrading a system, you can do that atomically - put the new file alongside the old one, switch the symlink, delete the old file. There's never a point when the library is missing. Of course, now that the industry uses immutable images, this is largely redundant.
And after all that, the dynamic linker mmap
s the libraries and our executable into memory, performs relocations, resolves symbols and writes them into the GOT (or delayes resolving them - in which case the PLT is created. In short - it is a special "default" place to jump for unresolved functions, which resolves the function, writes the address into the GOT, and jumps there - so on subsequent calls the GOT is used right away.
Do we actually need all of this machinery? I'd argue we don't - just having static binaries, that dlopen
loadable modules is a much simpler mechanism. But it's still important to know how dynamic libraries work, even if purely for being a well-rounded professional.
This is it for now - in the next blog, we will cover debugging a program. So stay tuned for that!
Top comments (0)