Eduardo Pinho

Posted on Mar 15, 2022

Writing bindings to `dos-like` for Rust: some lessons learned

#rust #msdos #showdev #gamedev

My childhood included a fair share of MS-DOS games from the early to mid 1990's. And even decades later, I appreciate looking back to the games and studying the technologies surrounding them, such as Sound Blaster audio and the various VGA graphics modes of the time, honorably mentioning mode 13h.

Now, being both a Rust enthusiast and MS-DOS nostalgic, I have, multiple times, tried closing the gap on writing applications for real DOS systems in Rust. Unfortunately, this is not without issues, and there is not a clear path on how to go with this yet. More on existing efforts here.

So when I had some spare time the past weekend, I decided to do something a bit different: bring to Rust an existing framework that lets you write applications which look like they are DOS applications.
dos-like, made by Mattias Gustavsson, is like a small engine for writing modern applications with the look & feel of MS-DOS programs. So basically, when using this framework, we end up with applications that run on modern hardware and operating systems all the same, but with deliberate video effects and audio that bring us back to that era, including large pixels, CRT distortion, text and graphics video modes, and synthesized (Sound Blaster 16) or MIDI (Sound Blaster AWE32) music. It was written in C, mostly as a single file with some other statically linked dependencies. The project also comprises a few fun examples, such as a proof-of-concept FPS inspired by Wolfenstein 3D, a point-and-click adventure, a voxel renderer, and even a music tracker.

By creating direct bindings to the C interface (dos-like-sys), followed by a more high-level abstraction, it becomes possible and (hopefully) intuitive to write applications of this sort in Rust! So I did that!
In this post, I will now share a small collection of technical topics that I felt worth sharing about the conception of these bindings.

Compiling and static linking

Although I've had the experience of writing bindings to dynamically linked libraries in the past, this one blatantly called for direct static linking to a C object compiled on the spot. By following the documentation of cc and observing some build.rs files from other bindings, I managed to make compilation and linking work on Linux and Windows, straight from the original sources, fetched using a git submodule. A few extra steps were needed on Linux, since it depends on SDL2.

Alas, although there is WebAssembly support in the original dos-like, it is still not supported in the bindings for Rust. It would require a Rust toolchain to integrate with WAjic, which I am pretty much unfamiliar with. If you have any idea on how to achieve this, I would love to know.

Is this a library?

One of the most interesting things about the final dos-like package delivered to crates.io is that, while it is indeed declared as a Rust library, it has a strong caveat that needs to be considered whenever it is added to a project: it provides its own main function!

What makes dos-like so easy to create applications is that it takes care of all application bootstrapping by itself. When writing a new program in C or C++, this is what would happen:

In a new C file, the user includes dos.h and writes a main function definition as usual.
That C file is linked alongside dos.c, which reassigns the main function to a function named dosmain.
The framework replaces the main function definition with its own function, which does, among other things, call dosmain.

In a plain C environment, it is possible to do this kind of symbol reassignment with macros. But in Rust, there are no C macros! The library has no way of knowing that we have declared a main function in C land! As a consequence, the Rust linker would find two main function declarations and fail to link.

So the solution to making this work is this:

Add the no_main attribute, so that Rust does not try, nor demand you to declare a main function;
Declare an extern C function called dosmain instead.

#![no_main]

extern "C" fn dosmain() -> i32 {
    0
}

I also added a Rust macro to assist in this last declaration, but I don't find it very usable nor idiomatic.

Since the bottom part of the program is a C function, stack unwinding also does not work. I chose to recommend users to abort on panic instead, by writing this in their Cargo.toml file:

[profile.dev]
panic = "abort"

[profile.release]
panic = "abort"

Is that an `int`?

The majority of functions in the framework were pretty easy to translate to Rust. With the low-level bindings already built automatically from the C declarations via bindgen, all that was left was to encapsulate them in non-unsafe functions with idiomatic parameter types. A great deal of these functions received plain C ints as parameters. Instead of those, the parameter types were chosen based on what they represented:

u8 is great for color identifiers, since the application is limited to a 256 color palette. It was also used for the RGB value components of each color, in which functions expected integers between 0 and 255 anyway (although in the end they would be artificially trimmed to a precision of 6 bits per color, thus replicating the 18-bit palette of the time).
u16 was used for x and y coordinates, both in graphics mode and in text mode. 16 bits was a pretty common integer size back then, and the screen width or height would never be larger than this.
Where a 0 or 1 were expected for stating whether something should be on or off, a bool is used instead. Easy peasy.
Some functions expected a C string, as in a pointer to a null-terminated sequence of characters. Creating a function that receives a standard Rust string slice &str requires a new string to be allocated with a null character \0 appended at the end. To assist those who might already have a null-terminated string in handy, a separate function was created that receives a slice of CStr instead.
Certain resource identifiers were replaced with their high level abstraction, such as the Soundbank.

Since I often had to look into how each function was implemented, this was also an opportunity to document them. Maybe one day someone can pass along some of these texts into the original dos-like, if they still apply there.

Exclusive, safe video memory access? Pick one.

Now, dos-like is pretty usable with the various drawing primitives, such as line, bar, blit, circle, outtextxy, ... But dos-like also gives you a way to directly manipulate the video memory buffer to be written to the screen:

unsigned char* screenbuffer(void);

The output is a plain pointer to width * height bytes. If double buffering is enabled, swapbuffers passes that data to the display and gives you another buffer to write to. Both buffers are pieces of memory created once at boot, the library just gives one or the other in turns, on each call to swapbuffers.

unsigned char* swapbuffers(void);

These functions are often used in practice like this:

setvideomode(videomode_320x200);
setdoublebuffer(1);

unsigned char* buffer = screenbuffer();
while (shuttingdown() == 0) {
    waitvbl();

    // write things to buffer
    for (int i = 0; i < 320*200; i++) {
        buffer[i] = (i & 1) * 15;
    }

    // swap buffer
    buffer = swapbuffers();

    // ... handle user input
}

Now, so long as you do not write beyond the boundaries of any of the given memory buffers, you are safe and have nothing to worry about. But how can we make this both easy and safe to use in Rust?

The easiest way to use a writable memory buffer in Rust is with a mutable byte slice: &mut [u8]. We could think of a function done like this:

unsafe fn screen_buffer() -> &'static mut [u8] {
   let len = get_screen_resolution();
   std::slice::from_raw_parts_mut(screenbuffer(), len)
}

But this is completely memory unsafe! Mutable slices expect exclusive access to that piece of memory, with no aliasing of any kind. Unlike raw pointers, having more than one mutable slice to this video buffer is instant undefined behavior. And this function makes that as easy as calling it twice without swapping the buffers first.

unsafe {
    let buffer = screen_buffer();
    let buffer_copy = screen_buffer(); // UB :[
}

In addition, calling any other drawing function while we have a hold of this slice is undefined behavior as well.

unsafe {
    let buffer = screen_buffer();
    circle(120, 100, 16); // also UB :[
    buffer[0] = 1;
}

I did try to think of other ways to make exclusive access verified at compile time, such as creating a static global instance with a phantom lifetime to serve as a lock. But this wasn't enough to simulate double buffering, as it would prevent a legitimate use of swapping buffers and fetching a slice to the new buffer. Even with both functions returning mutable slices, they would be exclusive, and it would be safe to use the same variable, since a reassignment would drop the previous slice before there is a chance for aliased slices:

set_double_buffer(true);

let mut buffer = unsafe {
    screen_buffer() // slice to buffer0
};
while !shutting_down() {
    wait_vbl();

    // use buffer

    buffer = unsafe {
        swap_buffers() // slice to buffer1 (iteration #0)
    }; // slice to buffer0 (iter #0) dropped immediately
}

A safe abstraction for access to the two video buffers (let's call them buffer0 and buffer1) would:

allow the user to retain at most one mutable slice to buffer0 and at most one mutable slice to buffer1;
let the user seamlessly switch between the two in the main application loop, with a reassignment or something equally usable;
AND prevent calls to any other drawing primitives for as long as any of the slices are held.

I couldn't come up with this yet, but feel free to reach out if you have a better idea!

In the end, the two functions were exposed as unsafe, with detailed safety guidelines documented. Those who which to avoid direct video buffer access can still use the various drawing primitives and achieve the same thing with only a bit of extra overhead, which seems unlikely to become a problem in my opinion.

I am also not sure whether any of this is memory safe with double buffering disabled. Oh boy.

Returning small arrays? One simple trick.

With the elephant in the room pointed out, let's go back to easier challenges. We also have a few functions to provide a list of user input events:

keycode_t* readkeys(void);
const unsigned char* readchars(void);

The returned pointers are also null-terminated, where the first is for key codes and the second one is for characters. For instance, if the user held Shift and pressed A, the first function would give us the SHIFT key code and the A key code, whereas readchars would give us an upper-case A.

Keyboard and mouse events are accumulated in an internal buffer. When one of these functions are called, the recorded keycodes or characters are flushed into a separate buffer and a pointer to that buffer is returned. One may feel tempted to write a high-level function with this signature:

fn read_keys() -> &'static [KeyCode];

But doing this would not be memory safe! Rust requires that data behind a reference never mutate, but with subsequent calls to read_keys, the contents in the same buffer could change as part of the implementation.
On the other hand, unlike screenbuffer, this pointer is only intended to be read from, and as such it is a fairly good candidate for copying over to an owned vector. Here is the complete implementation:

pub fn read_keys() -> Vec<KeyCode> {
    let mut keys = Vec::new();

    unsafe {
        let p = dos_like_sys::readkeys();
        for i in 0..=255 {
            let c = *p.offset(i);
            if c == 0 {
                break;
            }
            keys.push(KeyCode(c));
        }
    }

    keys
}

A copy is free to be read and manipulated in consumer space without any risk of breaking memory invariants. Unless you're into keyboard smashing or only rarely call these functions, we expect them to return a very small number of events each time, often 0 to 2 values, making the copy pretty fast anyway. Moreover, if we replace Vec<_> with, say, a small_vec, we're avoiding heap allocations when the number of events in queue does not justify it. The final signature of the high level function is the following:

pub fn read_keys() -> SmallVec<[KeyCode; 2]>;

Validated resource identifiers

dos-like allows you to load custom fonts from .fnt files for text output in graphics mode, and custom soundbanks for synthesized music. When the functions below succeed, they save those resources in the framework and return a positive integer which identifies that resource for the rest of the application's lifetime.

int installuserfont(const char* filename);
int installusersoundbank(const char* filename);

Some functions would then accept an integer to identify the font or soundbank to use:

void settextstyle(int font, int bold, int italic, int underline);
void setsoundbank(int soundbank);

Technically, these identifiers are checked by the implementation, so it is not undefined behavior to pass an invalid ID. Still, a type safe abstraction around it prevents misuse by expecting an instance to be created first.

let font = install_user_font("files/volter.fnt")?;
set_text_style(font, 0, 0, 0);

This may seem oddly equivalent to what would be done in C, but there's a neat safeguard in the function signatures:

pub fn install_user_font(filename: impl AsRef<str>) -> Result<Font, FileError>;

pub fn set_text_style(font: Font, bold: bool, italic: bool, underline: bool);

The first one returns an identifier of type Font, only if the operation was successful. The second one will not accept anything else other than a font identifier, all of this checked at compile time. And an instance of this type can be passed just as easily as an int, because it's just a NonZeroU32 underneath! A NonZeroU32 is an unsigned 32 bit integer which is sure to never be zero. The compiler can take advantage of this to represent derivative types in less bytes (i.e. Option<NonZeroU32>).

For other resources, such as music and sound, such resource loading functions would yield their output via returned pointers to static memory or by output pointers. Sound, Music, and Image were created with an object oriented design. One advantage of this is that some of the functions can be intuitively made into methods.

let music = load_opb("files/doom.opb")?;
// play music, no looping, maximum volume
music.play(false, 255);

So this was an aggregation of tricks through which a C framework can be suited for writing applications in Rust. dos-like for Rust is freely available on crates.io and on GitHub. The GIF below shows an example that can be written in 60 lines of Rust code.