DEV Community

Ahmet Can Gulmez
Ahmet Can Gulmez

Posted on

ELF Executable Analysis in Detail

In everyday, we run some kind of programs to handle our works. There are many type of programs, GUIs, CLIs, TUIs or so on. But at low level, there are two kind of program formats: PE (Portable Executable) for Windows and ELF (Executable and Linkable Format) for Linux.

In this tutorial, I will explain the ELF executables in detail.

Firstly, let's start with the overall layout:

As you see, an ELF executable consists of four layers:

  • Executable Header

  • Program Headers

  • Sections

  • Section Headers

Executable Header

Every ELF file starts with an executable header, which is just a structured series of bytes telling you that it's an ELF file, what kind of ELF file it is, and where in the file to find all the other contents. It's defined as follow in /usr/include/elf.h:

In here:

  • e_ident: The executable header starts with a 16-byte array. First 4-byte, magic value, identifying the file as an ELF binary.

  • e_type: The type of the executable. For example REL means relocatable object file, EXEC means executable binary and DYN means dynamic libraries.

  • e_machine: The architecture that the executable is intended to run on.

  • e_version: The version of the ELF specification (always 1, I really don't know why it exists 😂).

  • e_entry: The virtual address at which execution should start.

  • e_phoff, e_shoff: The file offsets to the beginning of the program header table and the section header table.

  • e_flags: The flags specific to the architecture for which the binary is compiled (typically 0 for x86_64).

  • e_ehsize: The size of the executable header, in bytes (for 64-bit systems, it's always 64).

You can see the executable header with this command:

$ readelf -h ./prog
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Position-Independent Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x1400
  Start of program headers:          64 (bytes into file)
  Start of section headers:          135496 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         13
  Size of section headers:           64 (bytes)
  Number of section headers:         29
  Section header string table index: 28
Enter fullscreen mode Exit fullscreen mode

Section Headers

The code and data in an ELF binary are logically divided into continuous nonoverlapping chunks called sections. Sections don't have any predetermined structure; instead, the structure of each section varies depending on the contents. So every section is described in section header table. It's defined as follow:

In here:

  • sh_name: The index of the sections.

  • sh_type: Every section has a type, indicated by an integer field. Common types: PROBITS for machine instructions, SYMTAB for static symbol table, DYNSYM for dynamic symbol table, STRTAB for string tables, REL(A) for relocation entries used by the linker, DYNAMIC for information needed for dynamic linking or so on.

  • sh_flags: The section flags. Common flags: W means the section is writable, A means the content of the section are to be loaded into memory when executing the executable, X means the section is executable.

  • sh_addr, sh_offset, sh_size: The virtual address, file offset (in bytes from the start
    of the file), and size (in bytes) of the sections.

  • sh_link: Sometimes, there are relationships between sections. This field keeps the index count of the related sections.

  • sh_info: Additional information about sections.

  • sh_addralign: Some sections may need to be aligned in memory in a particular way for efficiency of memory accesses. The values 0 and 1 are reserved to indicate no special alignment needs.

  • sh_entsize: Some sections contain a table of well-defined data structures. For such sections, this field indicates the size in bytes of each entry in the table. When the field is unused, it is set to zero.

You can see the section headers with this command:

$ readelf --sections --wide ./prog
There are 29 section headers, starting at offset 0x21148:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        0000000000000318 000318 00001c 00   A  0   0  1
  [ 2] .note.gnu.property NOTE            0000000000000338 000338 000030 00   A  0   0  8
  [ 3] .note.gnu.build-id NOTE            0000000000000368 000368 000024 00   A  0   0  4
  [ 4] .note.ABI-tag     NOTE            000000000000038c 00038c 000020 00   A  0   0  4
  [ 5] .gnu.hash         GNU_HASH        00000000000003b0 0003b0 000028 00   A  6   0  8
  [ 6] .dynsym           DYNSYM          00000000000003d8 0003d8 000378 18   A  7   1  8
  [ 7] .dynstr           STRTAB          0000000000000750 000750 00016d 00   A  0   0  1
  [ 8] .gnu.version      VERSYM          00000000000008be 0008be 00004a 02   A  6   0  2
  [ 9] .gnu.version_r    VERNEED         0000000000000908 000908 000080 00   A  7   2  8
  [10] .rela.dyn         RELA            0000000000000988 000988 0000d8 18   A  6   0  8
  [11] .rela.plt         RELA            0000000000000a60 000a60 0002d0 18  AI  6  24  8
  [12] .init             PROGBITS        0000000000001000 001000 00001b 00  AX  0   0  4
  [13] .plt              PROGBITS        0000000000001020 001020 0001f0 10  AX  0   0 16
  [14] .plt.got          PROGBITS        0000000000001210 001210 000010 10  AX  0   0 16
  [15] .plt.sec          PROGBITS        0000000000001220 001220 0001e0 10  AX  0   0 16
  [16] .text             PROGBITS        0000000000001400 001400 018c54 00  AX  0   0 16
  [17] .fini             PROGBITS        000000000001a054 01a054 00000d 00  AX  0   0  4
  [18] .rodata           PROGBITS        000000000001b000 01b000 002ee8 00   A  0   0 16
  [19] .eh_frame_hdr     PROGBITS        000000000001dee8 01dee8 0005fc 00   A  0   0  4
  [20] .eh_frame         PROGBITS        000000000001e4e8 01e4e8 001914 00   A  0   0  8
  [21] .init_array       INIT_ARRAY      0000000000020cc0 020cc0 000008 08  WA  0   0  8
  [22] .fini_array       FINI_ARRAY      0000000000020cc8 020cc8 000008 08  WA  0   0  8
  [23] .dynamic          DYNAMIC         0000000000020cd0 020cd0 000200 10  WA  7   0  8
  [24] .got              PROGBITS        0000000000020ed0 020ed0 000130 08  WA  0   0  8
  [25] .data             PROGBITS        0000000000021000 021000 000010 00  WA  0   0  8
  [26] .bss              NOBITS          0000000000021020 021010 000010 00  WA  0   0 32
  [27] .comment          PROGBITS        0000000000000000 021010 00002b 01  MS  0   0  1
  [28] .shstrtab         STRTAB          0000000000000000 02103b 00010a 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)
Enter fullscreen mode Exit fullscreen mode

Sections

In previous chapter, you see that the ELF executable has a set of sections at which each section has a special meaning. For now, let me explain these:

  • .init and .fini: The .init section contains machine code that performs initialization before the main() function and .fini runs after it.

  • .text: This section is where the main code of the executable resides that we've wrote. It's the main concern for binary analysis or reverse engineering efforts.

  • .bss, .data, .rodata: There sections are where the program variables live. According to the declaration style, the variable goes into one of the three sections. For example, initialized global variables live in .data, uninitialized global variables live in .bss and constant variables live in .rodata. Don't forget that the local variables go in the stack, so .text, not into one of these sections.

  • .rel.* and .rela.*: These sections contain the information used by the linker for performing relocations.

  • .dynamic: It contains the "road map" for dynamic linker when loading and setting up the ELF executable.

As an example, you can inspect a specific section with this command:

$ objdump -j .text -d ./prog

./prog:     file format elf64-x86-64

Disassembly of section .text:

0000000000001400 <.text>:
    1400:       f3 0f 1e fa             endbr64
    1404:       31 ed                   xor    %ebp,%ebp
    1406:       49 89 d1                mov    %rdx,%r9
    1409:       5e                      pop    %rsi
    140a:       48 89 e2                mov    %rsp,%rdx
    140d:       48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
    1411:       50                      push   %rax
    1412:       54                      push   %rsp
    1413:       45 31 c0                xor    %r8d,%r8d
    1416:       31 c9                   xor    %ecx,%ecx
    1418:       48 8d 3d 48 15 00 00    lea    0x1548(%rip),%rdi        # 2967 <rand@plt+0x1577>
    141f:       ff 15 b3 fb 01 00       call   *0x1fbb3(%rip)        # 20fd8 <rand@plt+0x1fbe8>
    1425:       f4                      hlt
    1426:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
    142d:       00 00 00 
    1430:       48 8d 3d d9 fb 01 00    lea    0x1fbd9(%rip),%rdi        # 21010 <rand@plt+0x1fc20>
    1437:       48 8d 05 d2 fb 01 00    lea    0x1fbd2(%rip),%rax        # 21010 <rand@plt+0x1fc20>
    143e:       48 39 f8                cmp    %rdi,%rax
    1441:       74 15                   je     1458 <rand@plt+0x68>
    1443:       48 8b 05 96 fb 01 00    mov    0x1fb96(%rip),%rax        # 20fe0 <rand@plt+0x1fbf0>
    144a:       48 85 c0                test   %rax,%rax
    144d:       74 09                   je     1458 <rand@plt+0x68>
    144f:       ff e0                   jmp    *%rax
    1451:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

(...)
Enter fullscreen mode Exit fullscreen mode

Program Headers

The program header table provides a segment view of the binary, as opposed to the section view provided by the section header table. It's defined as follow:


In here:

  • p_type: The type of segments. Common types: LOAD means the segment are intended to be loaded into memory, INTERP means the segment contains .interp section providing name of the interpreter that is to be used to load the executable, DYNAMIC means the segment contains the .dynamic section, which tells the interpreter how to parse and prepare the binary for execution, PHDR means the segment encompasses the program header table.

  • p_flags: The runtime access permissions for the segments. Common flags: X means executable, R means readable, W means writable or so on.

  • p_offset, p_vaddr, p_paddr, p_filesz, p_memsz: These equals to sh_offset, sh_addr, sh_size fields in the section headers.

  • p_align: It's equal to sh_addralign field.

You can see the program headers with this command:

$ readelf --segments --wide ./prog

Elf file type is DYN (Position-Independent Executable file)
Entry point 0x1400
There are 13 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000000040 0x0000000000000040 0x0002d8 0x0002d8 R   0x8
  INTERP         0x000318 0x0000000000000318 0x0000000000000318 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x000d30 0x000d30 R   0x1000
  LOAD           0x001000 0x0000000000001000 0x0000000000001000 0x019061 0x019061 R E 0x1000
  LOAD           0x01b000 0x000000000001b000 0x000000000001b000 0x004dfc 0x004dfc R   0x1000
  LOAD           0x020cc0 0x0000000000020cc0 0x0000000000020cc0 0x000350 0x000370 RW  0x1000
  DYNAMIC        0x020cd0 0x0000000000020cd0 0x0000000000020cd0 0x000200 0x000200 RW  0x8
  NOTE           0x000338 0x0000000000000338 0x0000000000000338 0x000030 0x000030 R   0x8
  NOTE           0x000368 0x0000000000000368 0x0000000000000368 0x000044 0x000044 R   0x4
  GNU_PROPERTY   0x000338 0x0000000000000338 0x0000000000000338 0x000030 0x000030 R   0x8
  GNU_EH_FRAME   0x01dee8 0x000000000001dee8 0x000000000001dee8 0x0005fc 0x0005fc R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x020cc0 0x0000000000020cc0 0x0000000000020cc0 0x000340 0x000340 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 
   03     .init .plt .plt.got .plt.sec .text .fini 
   04     .rodata .eh_frame_hdr .eh_frame 
   05     .init_array .fini_array .dynamic .got .data .bss 
   06     .dynamic 
   07     .note.gnu.property 
   08     .note.gnu.build-id .note.ABI-tag 
   09     .note.gnu.property 
   10     .eh_frame_hdr 
   11     
   12     .init_array .fini_array .dynamic .got 
Enter fullscreen mode Exit fullscreen mode

Until here, I gave you the overall, a bit detailed, presentation about the ELF executable. For next articles, I will dive into the binary analysis techniques in deeply for Linux.

Resource:

Andriesse D., Practical Binary Analysis, no starch press, San Francisco.

Top comments (0)