Zul Ikram Musaddik Rayat

Posted on Jun 22

Rust Dynamic Library: Beyond Monolithic Executables

#rust #c

Usually, Rust bundles everything into one single executable. This is a massive selling point for the language. You run your build command, and you get a neat, standalone binary that you can drop onto a server or send to a user with zero external dependencies.

But what if you need a modular plugin system? What if you are building an application with local AI features and need to hot-swap heavy hardware-acceleration modules at runtime based on the user's setup, moving between a basic CPU runner, a heavy NVIDIA CUDA worker, or a Vulkan compute pipeline?

Statically linking every single heavy hardware framework into a monolithic binary makes compilation agonizing and forces your users to download gigabytes of heavy dependencies for graphics cards they do not even own. For these architectures, shipping a single giant executable fails. You need a way to load isolated pieces of code at runtime, and to do that, you need to understand how Rust handles dynamic libraries down to the bytes.

The Four Pillars of Rust Crate Types

To break an application apart, you first have to tell the compiler what kind of file it is actually making by configuring crate-type in your Cargo.toml.

1. The Static Archives: rlib and staticlib

rlib (Rust Library): The default output format. It is a static archive packed with intermediate code and heavy internal compiler metadata. It can only be used by the Rust compiler during a normal static compilation process.
staticlib (System Static Library): This compiles your code into a standard platform-native static archive (.lib or .a). It strips away Rust metadata and bundles dependencies so that a traditional C or C++ compiler can link against it statically.

2. The Dynamic Binaries: dylib vs cdylib
When you want to load external files at runtime, you have to choose between two completely different dynamic strategies:

dylib (Rust Dynamic Library): This creates a dynamic library (.dll, .so, or .dylib) meant to be shared strictly among other Rust programs. However, it introduces an absolute deployment nightmare. First, it completely prevents Link Time Optimization (LTO) across the library boundary. Second, it does not bundle the standard library. If you ship a dylib, you must also ship the exact, matching std-hash.dll that your specific compiler build generated, or it will refuse to load.
cdylib (C-Compatible Dynamic Library): This produces a standard system dynamic library optimized for isolation. It strips out internal compiler metadata and bundles the standard library directly into the file. Because it is a self-contained asset, you do not have to worry about missing standard library hashes on the user's machine. It also fully supports LTO inside its own compilation boundary, yielding a highly optimized, lean binary.

The Orthogonality of ABIs and Crate Types

The biggest point of confusion here is a false assumption that choosing a cdylib means you are forced to use the standard C ABI. It does not. Crate types and ABIs are completely orthogonal concepts.

The crate type defines the file container format and how symbols are hidden or exposed on disk. The ABI (Application Binary Interface) dictates the machine-level rules: how arguments are placed into registers, how memory structures are padded, and how the call stack behaves.

extern "C" (The C ABI): The universal industry standard. It is stable across different languages and compiler versions. The massive downside is that you can only pass raw C-compatible primitives. You lose the ability to easily send high-level Rust structs, enums, closures, or trait objects without tedious manual marshalling.
extern "Rust" (The Rust ABI): The native calling convention of the Rust compiler. It supports our rich, expressive type system out of the box. The catch is that the Rust ABI is explicitly unstable. The compiler layout rules can change on a whim between toolchain updates or even different optimization levels.

But because container types and ABIs are completely separate, you can absolutely expose an extern "Rust" function from a clean, isolated cdylib container.

Designing a Safe Pure-Rust Plugin System

In my own desktop application, I needed a way to dynamically load localized compute engines. The core user interface of the app is completely lightweight, but the execution modules are packed with massive dependencies depending on whether they target a CPU, an NVIDIA CUDA pipeline, or a Vulkan backend.

Why I Chose `extern "Rust"` Inside a `cdylib`

Every guide online tells you to use extern "C". But I was not building a generic public library meant to be imported by foreign languages like Python or Node.js. I was building a localized application architecture where I owned the code for both the main host app and the internal feature backends.

If I had used extern "C", I would have spent days writing brittle pointer translations, handling unsafe memory chunks manually, and tearing down native closures just to pass a simple data stream across the boundary.

By picking extern "Rust" inside a clean cdylib wrapper, I kept the ease of standard Rust code. To make this completely safe despite an unstable compiler ABI, I introduced a strict rule into my development workflow: The host application and the dynamic backend modules must be compiled out of the exact same repository workspace, utilizing the identical compiler version, profile configuration, and target flags.

Because I completely control the build pipeline, the compiler guarantees that type definitions, register allocations, and virtual table structures line up flawlessly on both sides of the boundary. This synchronization completely neutralizes the risks of an unstable ABI.

Here is exactly how the architecture is constructed.

1. The Intermediary Interface Layer (rlib)

First, I created a lightweight shared interface crate called compute_api. Both the host application and the hardware backends import this shared contract.

// In compute_api/src/lib.rs
pub trait ComputeEngine: Send {
    fn engine_kind(&self) -> &'static str;

    fn stream_data(
        &self,
        payload: &str,
        cycles: usize,
        on_token: &mut dyn FnMut(&str),
    ) -> Result<String, String>;
}

2. The Dynamic Compute Module (cdylib)

Next, I built a specialized backend engine module. In its Cargo.toml, I declared crate-type = ["cdylib"] so it compiles down into an isolated system asset with full LTO support, completely avoiding the distribution bloat of dylib targets.

Inside, I implemented our interface trait and exposed an un-mangled factory function using the native extern "Rust" ABI.

// In hardware_backend/src/lib.rs
use compute_api::ComputeEngine;

struct AcceleratedEngine {
    configuration_profile: String,
}

impl ComputeEngine for AcceleratedEngine {
    fn engine_kind(&self) -> &'static str {
        "hardware_accelerated_backend"
    }

    fn stream_data(
        &self,
        payload: &str,
        _cycles: usize,
        on_token: &mut dyn FnMut(&str),
    ) -> Result<String, String> {
        // High-performance execution happens here
        on_token("Processing payload packet...");
        Ok(format!("Streamed '{}' using profile: {}", payload, self.configuration_profile))
    }
}

// The native Rust ABI factory point
#[no_mangle]
pub extern "Rust" fn load_engine_interface(config: &str) -> Result<Box<dyn ComputeEngine>, String> {
    Ok(Box::new(AcceleratedEngine {
        configuration_profile: config.to_string(),
    }))
}

3. The Runtime Host Loader (bin) using Libloading

Finally, the host application loads this cdylib artifact at runtime using the libloading crate.

The critical trap here is memory safety. When you load a dynamic library, the underlying machine code for the virtual table (vtable) that your Box<dyn ComputeEngine> references lives strictly inside that loaded file's memory segment. If the libloading::Library instance is dropped, the operating system unloads the file from memory immediately. Any subsequent calls to your trait methods will hit unmapped addresses and instantly crash with a fatal segmentation fault.

To make this bulletproof, I created a wrapper struct that ties the lifetimes of the loaded library and the trait object together, ensuring the library remains pinned in memory for as long as the engine interface exists.

// In host_app/src/main.rs
use compute_api::ComputeEngine;
use std::sync::Arc;

// Define the exact signature of our un-mangled native Rust ABI factory
type FactorySymbol = unsafe extern "Rust" fn(&str) -> Result<Box<dyn ComputeEngine>, String>;

// A safe container ensuring the underlying library stays alive in memory
struct RuntimePlugin {
    // Keeping the library inside an Arc guarantees it won't be dropped prematurely
    _raw_library: Arc<libloading::Library>,
    pub engine: Box<dyn ComputeEngine>,
}

impl RuntimePlugin {
    pub fn init(binary_path: &str, target_config: &str) -> Result<Self, String> {
        unsafe {
            // 1. Open the dynamic binary file from the disk
            let lib_file = libloading::Library::new(binary_path)
                .map_err(|err| format!("Failed to load dynamic file: {err}"))?;

            let shared_library = Arc::new(lib_file);

            // 2. Fetch our un-mangled factory symbol using our exact signature definition
            let factory: libloading::Symbol<FactorySymbol> = shared_library
                .get(b"load_engine_interface")
                .map_err(|err| format!("Failed to locate symbol: {err}"))?;

            // 3. Securely invoke the factory across the library boundary using the Rust ABI
            let engine = factory(target_config)?;

            Ok(RuntimePlugin {
                _raw_library: shared_library,
                engine,
            })
        }
    }
}

fn main() {
    // Select the correct filename depending on our target platform
    #[cfg(target_os = "windows")]
    let plugin_filename = "hardware_backend.dll";
    #[cfg(target_os = "linux")]
    let plugin_filename = "libhardware_backend.so";
    #[cfg(target_os = "macos")]
    let plugin_filename = "libhardware_backend.dylib";

    // Dynamic runtime activation
    let plugin = RuntimePlugin::init(plugin_filename, "cuda_enabled").unwrap();

    println!("Loaded engine type: {}", plugin.engine.engine_kind());

    // Setup an ergonomic native mutable closure to pass through the library wall
    let mut token_emissions = 0;
    let mut render_stream = |token: &str| {
        println!("Host captured event stream token: {token}");
        token_emissions += 1;
    };

    // Interact with the trait object seamlessly as if it were statically linked
    let computation_result = plugin
        .engine
        .stream_data("large_dataset_buffer", 500, &mut render_stream)
        .unwrap();

    println!("Final Result String: {computation_result}");
    println!("Total token callback events: {token_emissions}");
}

Choosing the Right Tool for the Job

Building dynamic internal components in Rust does not mean you have to surrender your workflow to the strict limitations of the traditional C ABI. By separating binary file structure (cdylib) from execution boundaries (extern "Rust"), you unlock a beautiful architectural superpower. You can pass complex traits, nested results, and stateful closures across dynamic walls with zero serialization overhead and zero manual memory boilerplate.

However, this design demands full control over your compilation environment. Because the internal Rust ABI provides no long-term evolutionary stability guarantees, this structure is a secret weapon for internal modularity. It is exceptionally well-suited for splitting up heavy feature dependencies or swap-capable hardware runtimes within a single cohesive product that is built altogether inside a uniform, synchronized continuous integration run.

If you are building an open ecosystem where unknown third-party developers write dynamic extensions on their own machines with distinct compiler versions, this pattern will crash. For public plugin ecosystems, your safe paths are still standard extern "C" boundaries, or sandboxing execution inside a WebAssembly engine.

But if you own the entire build and deployment flow? Dropping dylib bloat in favor of isolated, LTO-friendly cdylib targets while keeping the ergonomic beauty of native Rust functions is an unmatched development experience. You get runtime code hot-swapping, excellent binary optimization, and absolute type safety, without writing a single line of tedious translation boilerplate.

DEV Community

Rust Dynamic Library: Beyond Monolithic Executables

The Four Pillars of Rust Crate Types

The Orthogonality of ABIs and Crate Types

Designing a Safe Pure-Rust Plugin System

Why I Chose `extern "Rust"` Inside a `cdylib`

1. The Intermediary Interface Layer (rlib)

2. The Dynamic Compute Module (cdylib)

3. The Runtime Host Loader (bin) using Libloading

Choosing the Right Tool for the Job

Top comments (0)

The Four Pillars of Rust Crate Types

The Orthogonality of ABIs and Crate Types

Designing a Safe Pure-Rust Plugin System

Why I Chose extern "Rust" Inside a cdylib

1. The Intermediary Interface Layer (rlib)

2. The Dynamic Compute Module (cdylib)

3. The Runtime Host Loader (bin) using Libloading

Choosing the Right Tool for the Job

Why I Chose `extern "Rust"` Inside a `cdylib`