DEV Community

ik_5
ik_5

Posted on

Golang and shared objects Part 1 - The background

When creating an application using Golang, it is usually a monolithic executable that holds "everything" is inside.

When there is a need to separate some logic for execution, I find that many software architects prefer actors/model or workers for that.

I do think that there is room for having actors/model or workers, and I personally do use them, but as software developers we have so many tools for our disposal that I feel like they are not in use or taken in consideration on places that they can be helpful, and that is the trigger for the current series of posts.

One tool that is over looked is shared object a.k.a. shared library.

The following series of blog posts are about shared objects/libraries. On my next posts I will use Golang (and sometimes even C) for explaining how things are done, but the basics are the same in their concept for most programming languages - even most dynamic languages (such as Python and Ruby), uses the same concept but with different syntax or tool set.

What is it all about

Shared Object

In a nutshell:

Shared Object/Library is a binary file with dynamic/static loading table for functions.

There are two types of shared objects:

  1. Dynamic.
  2. Static.

As I wrote at the beginning of the post - the static version is usually what Go is using. Static library compiles functions inside the executable causing the function to be part of the program inside the executable as it was written by us for that program.

Dynamic library is actually the same as an executable file on most systems and formats, but without the execution part, only dynamic memory table that holds function addresses that can be used when looking for a function.

We know dynamic libraries as .dll, .so and .dylib.

On this post, I'll only talk about Dynamic Library. You can find more information about static library on this link.

Loading of a library

A library can be used in two main ways:

  1. Static loading
  2. Dynamic loading

Static Loading

The main usage of a library is to provide reusable functions.
The functions do have memory address, but it is built to be as it was part of the application that calls it.

In order to do that, a programming language requires to share the function name, and it's arguments.

A function name (as well as some additional data structures) are called symbol. The symbols hold a unique identifier for accessing, and contain "human readable" value.

In order to access a symbol, there is a need to know the memory order of argument both for input and output values of the function.

The signature and arguments order have the name of ABI - Application Binary Interface.

Once a developer knows all of that information, it can instruct a linker on how to connect a library and it's function(s) to the executable or another library.

It happens on build time (after compilation), when the executable is ready, there is an instruction for the Operating System (OS) on how it should be loaded with the process as part of it dependencies.

A library is loaded by the OS when it is executing the process (program) as part of the memory address as the rest of the process, having the memory address of the library as something virtual rather then physical.

A library can be loaded many times on the entire system, based on the number of processes or other libraries that uses it and are loaded by the OS, but each copy has it's own memory address that is private and not accessible by non related processes and shared libraries.

On Unix system, it is easy to see the static linking by using the ldd command:

$ ldd main
        linux-vdso.so.1 (0x00007fffed2c9000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f5321c21000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f5321e74000)

Dynamic Loading

On static loading the linker and OS connects a library function to memory address, but it requires to bind a shared library to a process or another library. It is limiting when writing extensions for example.

Another way to connect such function is on run-time.
Run-time means that the process is loading the library to memory and requires to handle the entire ABI on run-time, including freeing the memory that is allocated and any issues that can happen by going that way.

Due to the fact that the information is dynamic, it is impossible to know beforehand what library will be linked, unlike the static loading.

Loading of a function

As mentioned above, a function on a shared library/object is called a symbol.

A symbol is a unique identifier for function, variable, constants etc...
Something that the execution process knows how to access when it finds the name, and can find it's memory using Symbol table.

The symbol table holds a given name, the type of what the symbol is, memory position in order to attach to it's execution and ABI.

Based on the format, it can hold additional information, or even a bit different method of getting such information, but that does not matter for us on this post.

When a function is called, it is translated to a symbol name, and the symbol is looked upon and is used based on the memory address and usage.

There are two ways in doing so:

  1. Static loading
  2. Dynamic loading

Static loading

For static loading of a function, the main code need to have a function with the same ABI, and the function will hold the memory to the function at the library.

It is done after "compile time" - The linker places the memory address of the library symbol to the function at the executable (or other library that uses it).

Using the command objdump on Linux it is possible to see the linked symbols and their usage (written by me in C):

$ objdump -t ./main

./main:     file format elf64-x86-64 

SYMBOL TABLE:
...
000000000000000       F *UND*  0000000000000000              printf@@GLIBC_2.2.5    
0000000000000000       F *UND*  0000000000000000              __libc_start_main@@GLIBC_2.2.5                     
0000000000000000       F *UND*  0000000000000000              g_uri_parse_scheme        
...

Dynamic loading

Dynamic loading of a function is as of dynamic loading of a library, and actually is part of it.
The dynamic loading happens on run-time - The linking is done dynamically at the code level rather then at linkage time, providing more flexible way on expanding a process capabilities when needed.

A function or memory address a.k.a pointer (depends on the programming language) is assigned to hold the loaded symbol.

On Unix system the main library to use commands is named libc or C Standard Library. libc contains a set of commands to load libraries dynamically, however it is not part of the main libc, but an extension, because it is part of an operating system and not a standard way of dynamic loading a symbol out of a shared object.

On most Unix systems, the API for loading symbols dynamically are under dlsym extensions.

On MS-Windows there is a different but similar implementation named libloaderapi, contains additional functionality that Windows provides.

Summary

On this blog post I have provided information that explains what is a shared object that is also known as shared library.

I explained that there are two types of shared objects:

  1. Static shared object.
  2. Dynamic shared objects.

I have explained in a nutshell the differences, and focused on dynamic shared objects.

Explained that there are two methods of using them:

  1. Static linking.
  2. Dynamic linking.

And in a nutshell what it means.

What comes next?

On the next part I will start writing some code, providing examples on how to write shared library both using C and Go.

And on the 3rd part how to use them inside both C and Go.

The 4th part will explain how to create special cases of Shared libraries (e.g. Go's plugins).

Top comments (5)

Collapse
 
evilcel3ri profile image
chris

Hi! Nice article!

I was wondering, if you build a binary with dynamic linking, those it impacts its portability? For example, if you build a program on your computer with dynamically linked libraries and run it on another computer where the libraries are not necessarily present, will it break?

Collapse
 
ik5 profile image
ik_5

Hi,

Thank you.

Yes, the program will fail to load.
Another issue is that different versions might also break it, because the address or ABI were changed.

When it fails, you as a process do not have the ability to control it, only the OS.

Collapse
 
evilcel3ri profile image
chris

Ok I see thank you for your answer. Then what would be advantage to use shared objects? My understanding of Go was that is was quite practical to run on different OS'es without libraries problems, this seems to go against that logic.

Thread Thread
 
ik5 profile image
ik_5

As I wrote using the first part, it is a mean to share code that external to the system.

Go uses most of the time static libraries that are compiles inside.
So you are fixated with what you have built the binary with.

If you have a bug in a library, you still need to rebuild the binary.

Using shared object you can just deploy that (something that MS does most of the times).

Also it is a way to create plugins. Let's say you have an RPC interface, and you want to be open for many ways to communicate, sometimes GRPC, sometimes REST, pure TCP or whatever.

You can gain that using a shared object, and load it on runtime (it acts a bit different then on compile time) based on a configuration or need.

For example I have a system that communicates with 4 types of different vendors that each communicate in different protocols. But my system does not require to support all of them at the same time, so only the ones I do need to support are loaded using configuration and a library.

Thread Thread
 
evilcel3ri profile image
chris

Oh! That is really interesting. I never thought about doing it that way. Thank you!