loading...
Verkko­kauppa.com

Creating an FFI-compatible C-ABI library in Rust

ojrask profile image Otto Rask ・13 min read

This is part 2 in our PHP FFI + Rust blog series. Previously we took a look at how the FFI feature can be enabled and used in PHP 7.4, and now we will jump over to Rust and see how we can create C-ABI libraries ourselves, which can then be loaded using PHP FFI.

Why Rust?

C would be the obvious first choice when writing C-ABI libraries, yes. But alas, I am more versed in Rust than in C. I know how Rust works in general and what features are available, but am not an expert by any means, while I only know how to compile a C program and that's about it.

When using Rust there are some extra steps needed when compiling C-ABI compatible libraries, namely making sure our compiled code and logic is aligned and packed according to the C standard and creating a header file which C-ABI consumers can use to locate our code which they want to run.

Requirements

We will be working with stable Rust and a single external crate in this post. I assume you have Rust and Cargo installed and are ready to compile Rust programs from the command line. I am working on Linux Ubuntu, but the steps should be similar for other environments as well.

Example code

All code introduced in the series is available at GitHub in the examples repository I have created. Check the 201-rusty-hello-world directory, which contains a runnable example for the code we're about to write here in this post. You can edit and tinker with the code on your own machine and see what happens.

GitHub logo rask / php-ffi-examples

Runnable examples to learn how PHP FFI works

Rust Hello World

First we want to make a regular non-C-ABIfied Rust library just to have a starting point for this post. We will take this starting library and convert it to a C-ABI dynamic library in later steps.

Let's initialize a new blank library:

$ cargo init --lib my-library

With the skeleton in place, we can check it works by running cargo test which should return a passed test for the dummy test we have been scaffolded in src/lib.rs:

running 1 test
test tests::it_works ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Now we can go in and write a Hello World library for the greater good! Open the project in an editor of your choice, and let's begin by altering the unit test inside lib.rs. We want to have a function that returns the string Hello world! to whomever calls it.

Our test could look like:

#[cfg(test)]
mod tests {
    #[test]
    fn test_library_function_returns_correct_string() {
        let result = get_hello_world();

        assert!(result == String::from("Hello world!"));
    }
}

We call a function and expect it to return "Hello world!". Simple stuff so far. Let's run the test:

$ cargo test
error[E0425]: cannot find function `get_hello_world` in this scope
 --> src/lib.rs:5:22
  |
5 |         let result = get_hello_world();
  |                      ^^^^^^^^^^^^^^^ not found in this scope

error: aborting due to previous error

Oh no! Our function is nowhere to be found! Let's fix that. In the lib.rs file, create a new function to fix our error:

/// Return a hello world string to the caller.
fn get_hello_world() {}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_library_function_returns_correct_string() {
        let result = get_hello_world();

        assert!(result == String::from("Hello world!"));
    }
}

We created a new function, and also added use super::*; to the tests module definition to load it into the test scope. Now if we attempt to run tests, we are greeted with a compilation error instead of a test failure:

error[E0308]: mismatched types
  --> src/lib.rs:11:27
   |
11 |         assert!(result == String::from("Hello world!"));
   |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected (), found struct `std::string::String`
   |
   = note: expected type `()`
              found type `std::string::String`

Now the compiler knew the function was available, but type checking failed as we are not returning a String from the get_hello_world() function. Let's fix that:

fn get_hello_world() -> String {
    return String::new();
}

Let's test again:

running 1 test
test tests::test_library_function_returns_correct_string ... FAILED

failures:

---- tests::test_library_function_returns_correct_string stdout ----
thread 'tests::test_library_function_returns_correct_string' panicked at 'assertion failed: result == String::from("Hello world!")', src/lib.rs:13:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.


failures:
    tests::test_library_function_returns_correct_string

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

We did not get compilation errors, which is great, and now our actual testcase failed. We can see that a string was returned (because it compiled) but the strings do not match. To see what the comparison contains, we can convert to assert_eq!():

#[test]
fn test_library_function_returns_correct_string() {
    let result = get_hello_world();

    assert_eq!(result, String::from("Hello world!"));
}

And now when we run cargo test we get a better understanding what we are exactly comparing:

running 1 test
test tests::test_library_function_returns_correct_string ... FAILED

failures:

---- tests::test_library_function_returns_correct_string stdout ----
thread 'tests::test_library_function_returns_correct_string' panicked at 'assertion failed: `(left == right)`
  left: `""`,
 right: `"Hello world!"`', src/lib.rs:13:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.


failures:
    tests::test_library_function_returns_correct_string

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

Pay attention to the lines starting with left and right. We see the right side is the expected value from our test, and the left side does not match. We now need to fix our code to make them match:

fn get_hello_world() -> String {
    return String::from("Hello world!");
}

Now our tests should pass gloriously:

running 1 test
test tests::test_library_function_returns_correct_string ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests example-01-hello-world-library

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Woot!

We now have a library we could package up and ship to crates.io for others to use! But let's not do that, as we have other plans. Time to make it C-ABI compatible!

Writing C-compatible Rust

Writing C-compatible Rust might sound scary, complicated, and strange, but it really is not too difficult. Rust and C share quite a bit in the ABI-compatibility department, meaning in general Rust can be compiled as a C library assuming you're not doing anything strange.

The biggest pain point is converting between C and Rust data types and memory guarantees.

Assuming our library function in Rust returns a String, which is an entirely Rust concept, we cannot use that directly in C. What we can do, is juggle with a raw pointer (gasp!) and convert the data held by our String into a C-friendlier format.

Let's introduce two new functions which become our public API over C-ABI. Add the following functions and tests for them in the lib.rs file:

use std::os::raw::c_char;
use std::ffi::CString;

fn get_hello_world() -> String {
    return String::from("Hello world!");
}

#[no_mangle]
pub extern "C" fn c_hello_world() -> *mut c_char {
    let rust_string: String = get_hello_world();

    // Convert the String into a CString
    let c_string: CString = CString::new(rust_string).expect("Could not convert to CString");

    // Instead of returning the CString, we return a pointer for it.
    return c_string.into_raw();
}

#[no_mangle]
pub extern "C" fn c_hello_world_free(ptr: *mut c_char) {
    unsafe {
        if ptr.is_null() {
            // No data there, already freed probably.
            return;
        }

        // Here we reclaim ownership of the data the pointer points to, to free the memory properly.
        CString::from_raw(ptr);
    }
}

#[cfg(test)]
mod tests {
    use std::os::raw::c_char;
    use super::*;

    #[test]
    fn test_library_function_returns_correct_string() {
        let result = get_hello_world();

        assert_eq!(result, String::from("Hello world!"));
    }

    #[test]
    fn test_library_cabi_function_works() {
        let ptr: *mut c_char = c_hello_world();
        let cstring;

        unsafe {
            cstring = CString::from_raw(ptr);
        }

        assert_eq!(CString::new("Hello world!").unwrap(), cstring);
    }
}

The good old get_hello_world() function has been left as is, but we've introduced a new function that calls it: c_hello_world(). We also have a secondary function c_hello_world_free(ptr: *mut c_char), which is used to free the memory that the first function has reserved. We will take a look at it later.

c_hello_world() looks like a regular Rust function, but with some sprinkling on top:

  • #[no_mangle]
  • extern "C"
  • -> *mut c_char

The no_mangle attribute instructs the Rust compiler to not alter the function name when it is inserted to a binary file. This makes it easier for FFI users to call it, as the name is kept as "human-readable".

Writing extern "C" defines that this function should be callable outside Rust codebases, and the "C" portion instructs the Rust compiler to optimize the function for usage in C-ABI consumers.

The return type on the other hand is a raw pointer. A mutable one at that. Scared yet? I know I am.

Running cargo test should see the test passes for this new function as well. In the test function we call the C-compatible function, and assert that it produces a pointer to a string that is stored in C format (CString).

Raw pointers as explained by a Rust newbie

Rust has two of these raw pointer thingies: *const T and *mut T, where T is the type of data the pointer points to in memory. The *const and *mut parts seem like they are some dereferencing operator thingies, but they are not. They are literal type declarations for raw pointers. I was confused as well.

It is relatively rare to need to use these types of pointers when you're working on a pure Rust code base. They offer some powers that regular Rust does not provide, and are also required when interfacing with other languages, such as C.

What raw pointers essentially allow us to do is move stuff between the Rust world and C world using them. Well not move the stuff itself, but more like give an adress from Rust to C and vice versa: "The thing you want is here at this address, but I am not helping you with how to use it or process it". Yay pointers!

The data raw pointers point at can be anything. It could be *const String, or maybe *mut MyStruct. The type declaration just lets Rust know what it might be working with.

Working with raw pointers is unsafe in Rust, meaning you need to know what you're doing with them. For simple things you can get away with one or two unsafe blocks, but for complex stuff, you must pay attention. Otherwise you're corrupting or leaking information in memory.

Passing the pointers themselves around is not unsafe, but the second you want to read, modify, or delete the data they point to, you are in unsafe territory.

You also need to make sure the data behind raw pointers is freed properly. In Rust terms, when we create a raw pointer, we lose ownership and lifetime guarantees to a degree.

In our code we use CString::into_raw() and CString::from_raw() to momentarily release Rust ownership to C land, and then take it back. The from_raw() also triggers lifetime checks, and frees the data/memory using Rust rules. If you do not do this, you will most probably leak memory in you programs.

You should also check if you're working with a null pointer before doing anything with the data the pointer may or may not point to.

Read more about pointers at the Rust documentation.

What about the *_free function?

We added two functions for our C-ABI compatible API. One creates a CString, and the other is supposed to free memory which was created for the CString. How does this work and why is it done this way?

In C, you are in charge of memory. This means you are responsible for allocating memory, and freeing memory. Rust itself has guarantees in place that allow you to skip the low-level stuff most of the time, but in C you need to be more careful.

With our C-ABI we are allocating memory inside Rust, and then passing that memory pointer to someone outside Rust. This creates a situation where we "disable" the Rust borrow checker and lifetime rules for that allocation. Someone outside Rust can erase the memory, alter it, and so on.

With the free function, we provide a tool for the external code to return control of memory to Rust, after which Rust is again in charge with borrow checks and other guarantees.

In terms of FFI, we first call c_hello_world() which allocates memory inside Rust and provides us with a pointer to the allocation (T::into_raw()). After we're done using the memory, we call c_hello_world_free() with the same pointer which we received earlier, and then inside Rust we "consume" (T::from_raw()) that pointer to return control back to Rust.

A similar pattern is already in use inside PHP. fopen() and fclose() follows the same semantics for example. You open a resource, do work with it, and then you close it to prevent mistakes from happening, even if that would just mean a memory leak or similar.

Compiling into a C library

When we compile our example library right now, we only get a Rust library out of it (.rlib), which is quite useless if we want to use the library in a C-ABI FFI setup.

To alter what is compiled and where, we need to alter our Cargo.toml configuration.

If you have not touched it since we initialized the project, it should look something like this:

[package]
name = "my-library"
version = "0.1.0"
authors = ["John Doe <johndoe@example.com>"]
edition = "2018"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]

Quite bare, and a good starting point.

To change our compilation from a Rust library to a dynamic C library, we need to introduce a [lib] definition:

...

[dependencies]

[lib]
name = "my_library"
crate-type = ["cdylib"]

After that is in there, running cargo build should result in a libmy_library.so library appearing inside target/debug.

Hey, now we could take that file and use it in PHP! Does the following look familiar?

<?php

$ffi = \FFI::cdef(
    'ok this is missing for now',
    'libmy_library.so'
);

Neat stuff. The only thing we now need is a header. Let's create a new file into the target/debug directory (for now, later we will move those things elsewhere) called my_library.h:

#define FFI_LIB "libmy_library.so"

char *const c_hello_world();
void c_hello_world_free(char *const str);

Now we have

  • libmy_library.so
  • my_library.h.

We can use those in PHP as follows:

<?php declare(strict_types = 1);

$header = file_get_contents('/path/to/library.h');

$ffi = \FFI::cdef(
    $header,
    '/path/to/libmy_library.so'
);

$cstr = $ffi->c_hello_world();
$phpstr = \FFI::string($cstr);
$ffi->c_hello_world_free($cstr);

echo $phpstr;

Running that should output Hello world! in your terminal. How cool! Notice we first called the string (read: pointer) returning function, and after we read the contents to a PHP string, we call the memory freeing function to make sure we are not creating leaks.

Next up: how to automate writing these C header files, instead of manually tinkering with them every time our C-ABI API changes in Rust.

Automating C headers generation with cbindgen

In the Rust ecosystem, there are a multitude of libraries and tools with bindgen in their names. cbindgen, wasm-bindgen, etc. The naming comes from "bindings generator", and they are used to assist in creating cross language and cross binary bindings. With cbindgen we can create bindings (headers) from Rust in a C-ABI compatible way.

Using cbindgen for really small projects might be overkill, but I would say it is not a big dependency to depend on maintenance-wise.

To get cbindgen we need to modify our Cargo.toml:

...

[dependencies]

[build-dependencies]
cbindgen = "0.9.*"

...

We added a new section, called build-dependencies. This means Cargo will install those to be used in a crate build process, not the crate contents itself. cbindgen is used during the build process, so we install it this way.

Now you can run cargo update and it will fetch and install the dependency.

Once installed, we need to create a build step. Cargo supports a thing called build scripts, which are bare Rust files that are invoked when building a crate. Create a build.rs file next to Cargo.toml, and insert the following contents:

extern crate cbindgen;

use std::env;
use cbindgen::Language;

fn main() {
    let crate_dir = env::var("CARGO_MANIFEST_DIR").unwrap();

    cbindgen::Builder::new()
        .with_crate(crate_dir)
        .with_language(Language::C)
        .generate()
        .expect("Unable to generate bindings")
        .write_to_file("target/debug/my_library.h");
}

It is short, and the only thing we do is instantiate a cbindgen builder instance, alter the configuration a bit, then generate and write the headers to a wanted location on the filesystem.

To make Cargo use this build file, we need to add it to the Cargo.toml file:

[package]
name = "my-library"
version = "0.1.0"
authors = ["John Doe <johndoe@example.com>"]
edition = "2018"
build = "build.rs"

...

Notice the build key containing the build.rs file path.

When we now run cargo build, hopefully we see some header generation magic happen.

(Note: I had to run build twice, as somehow the cbindgen crate had to be compiled in two runs, try it if nothing appears that looks like a header file.)

Once compilation is done, check the target/debug directory. Can you see the my_library.h file? Check the contents. It should show a header similar to the one we manually wrote earlier, but now it has some new stuff. This means the cbindgen build worked, and now we have automated the generation of header files for our C-ABI library.

What cbindgen does, is it parses our source code for definitions that are marked as extern "C" or similar, and generates the bindings automatically based on those.

We would like to have cbindgen automatically insert the FFI_LIB definition into the header as well, in order to be able to use the \FFI::load() method instead of the \FFI::cdef() method in our PHP code. Also there are some unwanted includes in the header which are not needed with PHP FFI as far as I know.

cbindgen currently has no configurability for creating constants like FFI_LIB, but we can hack those in using the cbindgen::Builder::with_header() method. Add the following changes to build.rs:

cbindgen::Builder::new()
    .with_crate(crate_dir)
    .with_language(Language::C)
    .with_no_includes() // First we strip the default include lines which are not needed
    .with_header("#define FFI_LIB \"libmy_library.so\"") // then we insert our custom definitions, the .so path is relative to the header file we're writing
    .generate()
    .expect("Unable to generate bindings")
    .write_to_file("target/debug/my_library.h");

A bit hacky, yes, but works. Now if you run cargo build, the header should contain a new line which defines the FFI_LIB constant, and the default includes should be gone. Now we have a header file which is compatible with \FFI::load() in PHP land.

What's next?

Okay, so now we know how to write basic C-ABI and FFI-ready libraries in Rust, and also took a super quick peek at how to use those in PHP via FFI. Additionally we looked at how to use cbindgen to do some tedious work for us automatically.

Homework: try and see if you can work out how to add a user-supplied parameter to the get_hello_world and c_hello_world functions, so they can return Hello World, <param>! or similar.

In the upcoming posts in this series we will be writing some more complicated code, meaning we need to learn a bit about the various types that are available in PHP FFI from the C-ABI, and how to write Rust code that operates on those types as well.

Discussion

pic
Editor guide
Collapse
jeikabu profile image
jeikabu

I’m not particularly interested in PHP, but I enjoy seeing Rust ffi shenanigans. cbindgen has been on my mind for a while but I’ve yet had occasion to use it. Anyway, interesting stuff.

Collapse
ojrask profile image
Otto Rask Author

The post cdylib and header output should work as is with non-PHP callees as well, unless I've done some shady stuff in the code that I do not understand about. :D

Thanks!

Collapse
jeikabu profile image
jeikabu

I only gave it a quick glance, but looks sane to me. Multi-platform development and ffi in general has given me a newfound appreciation for C. Death to c++. 🤫