<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ahmed Farid</title>
    <description>The latest articles on DEV Community by Ahmed Farid (@afarid95).</description>
    <link>https://dev.to/afarid95</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1090834%2Fe821842f-bdf2-4f26-87c7-41ea62197011.jpg</url>
      <title>DEV Community: Ahmed Farid</title>
      <link>https://dev.to/afarid95</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/afarid95"/>
    <language>en</language>
    <item>
      <title>Undefined Reference: The Internals of Object Files and Linking</title>
      <dc:creator>Ahmed Farid</dc:creator>
      <pubDate>Wed, 07 Jun 2023 06:00:00 +0000</pubDate>
      <link>https://dev.to/afarid95/undefined-reference-the-internals-of-object-files-and-linking-3o40</link>
      <guid>https://dev.to/afarid95/undefined-reference-the-internals-of-object-files-and-linking-3o40</guid>
      <description>&lt;p&gt;If you're a C/C++ developer, you've probably encountered this annoying linker error before:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;undefined reference to 'symbol'&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this post, I explain exactly what this error means and why it occurs. We go into the details of object files and the linking process.&lt;/p&gt;

&lt;p&gt;Note: This post assumes a Linux environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;This error means one of the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You declared a function and called it without providing its definition.&lt;/li&gt;
&lt;li&gt;You included a header file of a library, and you called a function declared in this header file, but you didn't link to the library itself.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  C/C++ Compilation Steps
&lt;/h2&gt;

&lt;p&gt;The compilation of a C/C++ program is carried out in these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Preprocessing: Takes the original C/C++ source file and produces an intermediate C/C++ source file.&lt;/li&gt;
&lt;li&gt;Compilation: Takes the intermediate C/C++ source file and produces an assembly code file.&lt;/li&gt;
&lt;li&gt;Assembly: Takes the assembly code file and produces an object file.&lt;/li&gt;
&lt;li&gt;Linking: Takes the object files and links them to produce the final executable file.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We'll examine a simple C++ program, and we'll stop its compilation at step 2 "Compilation" to examine the output assembly file, and at step 3 "Assembly" to examine the output object file.&lt;/p&gt;

&lt;h2&gt;
  
  
  C/C++ to Assembly
&lt;/h2&gt;

&lt;p&gt;I'll explain in the context of C. The same concepts apply to C++ as well.&lt;/p&gt;

&lt;p&gt;Take a look at the following C program:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;defined_function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;undefined_function&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;defined_function&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;undefined_function&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We define a function called &lt;code&gt;defined_function&lt;/code&gt; with an empty body, we declare a function called &lt;code&gt;undefined_function&lt;/code&gt; without defining it, and we call both functions from the main function.&lt;/p&gt;

&lt;p&gt;Assume the program is in a file called &lt;code&gt;main.c&lt;/code&gt;. We compile the program with gcc using the -S option to stop at the compilation step and examine the output assembly file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcc -S main.c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output assembly file &lt;code&gt;main.s&lt;/code&gt; will be something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nasm"&gt;&lt;code&gt;    &lt;span class="nf"&gt;.file&lt;/span&gt;   &lt;span class="err"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;main.c&lt;/span&gt;&lt;span class="err"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;.text&lt;/span&gt;
    &lt;span class="nf"&gt;.globl&lt;/span&gt;  &lt;span class="nv"&gt;defined_function&lt;/span&gt;
    &lt;span class="nf"&gt;.type&lt;/span&gt;   &lt;span class="nv"&gt;defined_function&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="nv"&gt;function&lt;/span&gt;
&lt;span class="nl"&gt;defined_function:&lt;/span&gt;
&lt;span class="nl"&gt;.LFB0:&lt;/span&gt;
    &lt;span class="nf"&gt;.cfi_startproc&lt;/span&gt;
    &lt;span class="nf"&gt;pushq&lt;/span&gt;   &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="nb"&gt;rbp&lt;/span&gt;
    &lt;span class="nf"&gt;.cfi_def_cfa_offset&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;
    &lt;span class="nf"&gt;.cfi_offset&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;
    &lt;span class="nf"&gt;movq&lt;/span&gt;    &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="nb"&gt;rsp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="nb"&gt;rbp&lt;/span&gt;
    &lt;span class="nf"&gt;.cfi_def_cfa_register&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;
    &lt;span class="nf"&gt;nop&lt;/span&gt;
    &lt;span class="nf"&gt;popq&lt;/span&gt;    &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="nb"&gt;rbp&lt;/span&gt;
    &lt;span class="nf"&gt;.cfi_def_cfa&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
    &lt;span class="nf"&gt;ret&lt;/span&gt;
    &lt;span class="nf"&gt;.cfi_endproc&lt;/span&gt;
&lt;span class="nl"&gt;.LFE0:&lt;/span&gt;
    &lt;span class="nf"&gt;.size&lt;/span&gt;   &lt;span class="nv"&gt;defined_function&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;.&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;defined_function&lt;/span&gt;
    &lt;span class="nf"&gt;.globl&lt;/span&gt;  &lt;span class="nv"&gt;main&lt;/span&gt;
    &lt;span class="nf"&gt;.type&lt;/span&gt;   &lt;span class="nv"&gt;main&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="nv"&gt;function&lt;/span&gt;
&lt;span class="nl"&gt;main:&lt;/span&gt;
&lt;span class="nl"&gt;.LFB1:&lt;/span&gt;
    &lt;span class="nf"&gt;.cfi_startproc&lt;/span&gt;
    &lt;span class="nf"&gt;pushq&lt;/span&gt;   &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="nb"&gt;rbp&lt;/span&gt;
    &lt;span class="nf"&gt;.cfi_def_cfa_offset&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;
    &lt;span class="nf"&gt;.cfi_offset&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;
    &lt;span class="nf"&gt;movq&lt;/span&gt;    &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="nb"&gt;rsp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="nb"&gt;rbp&lt;/span&gt;
    &lt;span class="nf"&gt;.cfi_def_cfa_register&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;
    &lt;span class="nf"&gt;movl&lt;/span&gt;    &lt;span class="kc"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="nb"&gt;eax&lt;/span&gt;
    &lt;span class="nf"&gt;call&lt;/span&gt;    &lt;span class="nv"&gt;defined_function&lt;/span&gt;
    &lt;span class="nf"&gt;movl&lt;/span&gt;    &lt;span class="kc"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="nb"&gt;eax&lt;/span&gt;
    &lt;span class="nf"&gt;call&lt;/span&gt;    &lt;span class="nv"&gt;undefined_function&lt;/span&gt;
    &lt;span class="nf"&gt;nop&lt;/span&gt;
    &lt;span class="nf"&gt;popq&lt;/span&gt;    &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="nb"&gt;rbp&lt;/span&gt;
    &lt;span class="nf"&gt;.cfi_def_cfa&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
    &lt;span class="nf"&gt;ret&lt;/span&gt;
    &lt;span class="nf"&gt;.cfi_endproc&lt;/span&gt;
&lt;span class="nl"&gt;.LFE1:&lt;/span&gt;
    &lt;span class="nf"&gt;.size&lt;/span&gt;   &lt;span class="nv"&gt;main&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;.&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;main&lt;/span&gt;
    &lt;span class="nf"&gt;.ident&lt;/span&gt;  &lt;span class="s"&gt;"GCC: (GNU) 12.2.0"&lt;/span&gt;
    &lt;span class="nf"&gt;.section&lt;/span&gt;    &lt;span class="nv"&gt;.note.GNU&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="nv"&gt;progbits&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's go for a quick assembly refresher before we continue. An assembly program contains a list of instructions that the CPU executes one by one. These instructions could be movement instructions like &lt;code&gt;mov&lt;/code&gt;, calculation instructions like &lt;code&gt;add&lt;/code&gt; and &lt;code&gt;sub&lt;/code&gt;, control transfer instrutions like &lt;code&gt;jmp&lt;/code&gt; and &lt;code&gt;call&lt;/code&gt;, and many more.&lt;/p&gt;

&lt;p&gt;An assembly program can also contain labels at the beginning of lines in the form &lt;code&gt;label:&lt;/code&gt;. Writing labels at the beginning of lines like this defines them. A label is simply a label for this place in memory that can be referenced in other instructions like &lt;code&gt;jmp&lt;/code&gt; and &lt;code&gt;call&lt;/code&gt;. So for example &lt;code&gt;jmp label&lt;/code&gt; means jump to the memory location labeled by &lt;code&gt;label&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now let's examine the assembly program above. This program contains a lot of details that we don't need and we'll focus only on the important parts.&lt;/p&gt;

&lt;p&gt;Notice the labels &lt;code&gt;defined_function:&lt;/code&gt; and &lt;code&gt;main:&lt;/code&gt;: these correspond to the defined functions in our C program. Also, notice the instructions &lt;code&gt;call defined_function&lt;/code&gt; and &lt;code&gt;call undefined_function&lt;/code&gt;: these correspond to the function calls in our C program. Notice also that there is no label defined for the undefined function &lt;code&gt;undefined_function&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Assembly to Object File
&lt;/h2&gt;

&lt;p&gt;We now compile the C file with gcc using the -c option to stop at the assembly step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcc -c main.c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Alternatively, we can assemble the assembly file using the GNU assembler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;as main.s -o main.o
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both approaches will produce the same output object file &lt;code&gt;main.o&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Object File Format
&lt;/h2&gt;

&lt;p&gt;You probably know that the assembly step transforms the assembly code to binary, but is this binary the only thing that is present in the object file? The answer is no.&lt;/p&gt;

&lt;p&gt;The object file contains other metadata about the program such as section information, a symbol table and relocation information. In Linux, the object file is in a format called &lt;strong&gt;ELF (Executable and Linkable Format)&lt;/strong&gt;. There are many formats such as the PE (Portable Executable) format used in Windows and an older format called a.out. In this post, we'll focus on the ELF format.&lt;/p&gt;

&lt;p&gt;How can we view the contents of an ELF file? There is a utility in Linux called &lt;code&gt;readelf&lt;/code&gt; that we can use. We're only interested in the symbol table now, so we use readelf on the object file and pass -s to it to view only the symbol table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;readelf -s main.o
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Symbol table '.symtab' contains 6 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
     3: 0000000000000000     7 FUNC    GLOBAL DEFAULT    1 defined_function
     4: 0000000000000007    27 FUNC    GLOBAL DEFAULT    1 main
     5: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND undefined_function
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's analyze this output.&lt;/p&gt;

&lt;p&gt;Notice that there are symbols for &lt;code&gt;defined_function&lt;/code&gt; and &lt;code&gt;main&lt;/code&gt;, and their Ndx is a number which means they are defined. The assembler creates a defined symbol in the symbol table for each label defined in the assembly code. Because there are lines beginning with &lt;code&gt;defined_function:&lt;/code&gt; and &lt;code&gt;main:&lt;/code&gt; in the assembly code, they are defined symbols in the symbol table.&lt;/p&gt;

&lt;p&gt;Notice also that there is a symbol for &lt;code&gt;undefined_function&lt;/code&gt;, and its Ndx is UND which means undefined. The assembler creates an undefined symbol in the symbol table for each label referenced in instructions but not defined. Because &lt;code&gt;undefined_function&lt;/code&gt; is referenced in an instruction (&lt;code&gt;call undefined_function&lt;/code&gt;) and there is no line beginning with &lt;code&gt;undefined_function:&lt;/code&gt; in the assembly code, it is an undefined symbol in the symbol table.&lt;/p&gt;

&lt;p&gt;Also, notice that our three symbols &lt;code&gt;defined_function&lt;/code&gt;, &lt;code&gt;undefined_function&lt;/code&gt; and &lt;code&gt;main&lt;/code&gt; have a Bind of GLOBAL, which means they are global symbols. This is important because when the linker links files, it sees only the global symbols.&lt;/p&gt;

&lt;h2&gt;
  
  
  Executables and Library Files
&lt;/h2&gt;

&lt;p&gt;In the final stage of compilation, object and library files are linked together to produce an executable or library file. An executable file is a file with an entry point. A library file is a collection of object files where each object file has a collection of functions, and there is no entry point. There are two types of libraries: static and dynamic libraries. In this post, we're only interested in static libraries. In Linux, static libraries have a .a extension and are sometimes called static archives, and they are also in the ELF format.&lt;/p&gt;

&lt;h2&gt;
  
  
  Linking
&lt;/h2&gt;

&lt;p&gt;The input to linking is object files and library files. When linking, the linker reads all global symbols in all input object and library files. For each undefined symbol, the linker checks if there is a defined symbol with the same name taken from another file. If there is a defined symbol with the same name of the undefined symbol for each undefined symbol, the linking can proceed successfully. On the other hand, if there are undefined symbols that don't have defined symbols with the same name, the linker issues the error &lt;em&gt;undefined reference to 'symbol'&lt;/em&gt; for each of them.&lt;/p&gt;

&lt;p&gt;In our example, if the input to the linker is only &lt;code&gt;main.o&lt;/code&gt;, the linker will issue the following error:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;undefined reference to 'undefined_function'&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is because &lt;code&gt;undefined_function&lt;/code&gt; is an undefined symbol and there is no defined symbol with the same name.&lt;/p&gt;

&lt;p&gt;The solution of this error would be either to define &lt;code&gt;undefined_function&lt;/code&gt; in &lt;code&gt;main.c&lt;/code&gt;, to compile with another C file that has the definition of &lt;code&gt;undefined_function&lt;/code&gt;, or to link with a library that has &lt;code&gt;undefined_function&lt;/code&gt; defined.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In summary, each defined function in an input C file will have a defined symbol in the output object file, and each declared and called but not defined function in an input C file will have an undefined symbol in the output object file. During linking, when an undefined symbol doesn't have another defined symbol with the same name, the linker issues the &lt;em&gt;undefined reference&lt;/em&gt; error.&lt;/p&gt;

&lt;p&gt;That's it. I hope you now really understand why this error occurs and I hope you have also gained insight into the content of object files and the linking process 🙂&lt;/p&gt;

</description>
      <category>c</category>
      <category>cpp</category>
      <category>assembly</category>
      <category>elf</category>
    </item>
  </channel>
</rss>
