Michał Siedlaczek

Posted on Nov 9, 2020 • Originally published at msiedlaczek.com

Type-Safe Discrete Simulation in Rust

#rust #simulation #typesystem

In this article, I want to share some of my experience from implementing a small discrete-event simulation library. The code is available on github, and the (limited at the moment) documentation on docs.rs. Note that this implementation is experimental, so use caution if you want to use any of the code in the current form.

The main focus of this write-up is the library's type-safe interface. There is no jaw-dropping magic here, but I found the process of writing it very educational, and decided to share what I had learned. If you have any comments you'd like to share, corrections you'd like to make, or questions you'd like to ask, please do reach out, I would be very interested in hearing from you.

Keywords: rust type-safe simulation

I recently found myself in a need of a discrete-event simulation for one of my research projects. Although there are a number of existing open source libraries available for this exact problem, nothing truly worked for me. Many of them are quite old and not maintained for the past few years, some have other issues. The closest one to working for me was SimPy python library. This one, however, turned out to be very slow, and I got impatient with it quickly. SimJulia looks nice, but I have yet to learn Julia, and because the simulation is only part of the project, I wanted to use a language I'm familiar with and know what to expect from. More importantly, the simulations I need to run are not very complicated, so I figured a bespoke solution might be just what I needed. Plus, I could learn a thing or two, so why not.

Nowadays, I default to Rust when there are no language or library constraints. It should be noted that I had found a few Rust libraries that might have worked, but these didn't seem to be either mature or maintained. Therefore, I set out to write some simulation code in Rust. After prototyping several approaches, I ended up with some ideas and requirements for the core simulation module. Here, I want to focus on one important requirement that became clear quite quickly, which is type safety. Because even simple simulations have many moving pieces and are not trivial to test, I wanted to unload as much of correctness checking as possible onto the compiler.

System Components

First, let's talk about the overall system design and look at its pieces and how they interact. Note that this is by no means the best or even a correct way of designing simulations, but it seemed to fit my use case. On the other hand, nothing is written in stone, and I am open to suggestions.
At the highest level, my simulation consists of the following elements:

scheduler, which keeps track of time and events,
state, which stores any mutable information about the simulation,
components, which do some work, interact with each other, and sometimes mutate the state.

The state further distinguishes between regular values and queues, which are used to pass messages between components. Technically speaking, this distinction is not necessary, but queues are so useful that a dedicated interface is just convenient.

Requirements

One should be able to create an unlimited number of different types of components, and seamlessly inject them into the simulation. Each component can have a different type of event it reacts to. Ideally, it should be impossible to request scheduling an event of a wrong type or direct it to a component that doesn't exist. Similarly, it should be impossible to send a wrong type of value over a queue. Whenever possible, these checks should be done at compile time. In any other case, an invalid operation should terminate the simulation immediately with an appropriate error message.

Type-Safe Keys

I want to have an easy way of identifying components, stored objects, and queues with a key that can be easily stored, passed, and copied throughout the system. It would also be used as a handle to refer to the simulation's internal objects from outside of the module. In other words, I want an ID:

struct Key(usize);

Notice how the integer unnamed field is private. This prevents a user from constructing an arbitrary value and using it to, say, schedule an event on a component. The simulation itself takes responsibility for generating valid IDs. However, nothing stops us from passing a key generated for type T and use it to request type U. To rectify that, we can introduce a type marker:

#[derive(Debug, PartialEq, Eq, Hash)]
struct Key<T> {
    id: usize,
    _marker: std::marker::PhantomData<T>,
}
impl<T> Key<T> {
    // NOTE: Must be private or at most `pub(crate)` to not allow creating
    //       from outside of the crate.
    fn new(id: usize) -> Self {
        Key { id, _marker: std::marker::PhantomData }
    }
}

I precede the marker with an underscore to show it will never be used. Notice that I derived a few traits that might be useful for an ID type. But there are some that are still missing. In particular, I did not derive either Clone or Copy. The reason for this is that the compiler won't allow it unless I also constrain T to be Copy, but that would unnecessarily limit the types of values I can, say, store in the simulation state, or the types of events that can be passed. Instead, I can implement these manually:

impl<T> Clone for Key<T> {
    fn clone(&self) -> Self {
        Self::new(self.id)
    }
}
impl<T> Copy for Key<T> {}

Generating Keys

There are certainly several ways of generating these keys, but I will keep it simple. We can have a simple counter field in the simulation and increment it each time we return a new key. There is a trap here, however. Remember that our goal is to ensure type safety, which we did by introducing the type marker. Now, we need to ensure that having received a key, we can only use it to refer to the correct type of objects. If the IDs are unique only within a simulation, nothing stops the user from generating the same numerical ID of different types from two different simulations! It might be a rare problem, but there is a simple fix. Instead of storing the counter in the simulation, we can use an atomic static counter, which will ensure uniqueness of all IDs across the entire program.

use std::sync::atomic::AtomicUsize;
static ID_COUNTER: AtomicUsize = AtomicUsize::new(0);

// NOTE: Private function available only inside the module.
fn next_id() -> usize {
    ID_COUNTER.fetch_add(1, std::sync::atomic::Ordering::SeqCst)
}

State

Having defined the key type, let's use it to implement the simulation state. Let's start with the API.

impl State {
    pub fn insert<V: 'static>(&mut self, value: V) -> Key<V> { ... }
    pub fn remove<V: 'static>(&mut self, key: Key<V>) -> Option<V> { ... }
    pub fn get<V: 'static>(&mut self, key: Key<V>) -> Option<&V> { ... }
    pub fn get_mut<V: 'static>(&mut self, key: Key<V>) -> Option<&mut V> { ... }
}

When inserting a value, we receive a key that can be later used to remove it or get a reference to the stored object. As you can see, the only thing we require from the values is that they don't contain any non-static references, which is indicated by V: 'static lifetime constraint. That's all nice, but how do we actually store values of any given type? Answer: we use an std::any::Any trait object. I recommend reading the documentation to learn more about it, but in short, this is a trait used for type erasure. For example, a Box<T> can be cast to Box<dyn Any> and later downcast to its concrete type. This is illustrated in the implementation below.

struct State {
    values: std::containers::HashMap<usize, Box<dyn std::any::Any>>,
}
impl State {
    #[must_use = "Discarding key results in leaking inserted value"]
    pub fn insert<V: 'static>(&mut self, value: V) -> Key<V> {
        let id = Self::generate_next_id();
        self.store.insert(id, Box::new(value));
        Key::new(id)
    }
    pub fn remove<V: 'static>(&mut self, key: Key<V>) -> Option<V> {
        self.store
            .remove(&key.id)
            .map(|v| *v.downcast::<V>().expect("Ensured by the key type."))

    }
    pub fn get<V: 'static>(&self, key: Key<V>) -> Option<&V> {
        self.store
            .get(&key.id)
            .map(|v| v.downcast_ref::<V>().expect("Ensured by the key type."))
    }
    pub fn get_mut<V: 'static>(&mut self, key: Key<V>) -> Option<&mut V> {
        self.store
            .get_mut(&key.id)
            .map(|v| v.downcast_mut::<V>().expect("Ensured by the key type."))
    }
}

Notice I added a must_use attribute to insert because dropping the key effectively stops us from ever using that value again.

As I mentioned before, I would like to have additional interface for manipulating queues. The implementation is very similar to the value store, but some extra logic is necessary. I won't get into much details about it, as it's pretty straight forward, and rather tangent to the main point, but here's the code:

struct State {
    values: std::containers::HashMap<usize, Box<dyn std::any::Any>>,
    queues: std::containers::HashMap<usize, Box<dyn std::any::Any>>,
}
impl State {
    pub fn send<V: 'static>(&mut self, queue: QueueId<V>, value: V) -> Result<(), ()> {
        self.queues
            .get_mut(&queue.id)
            .expect("Queues cannot be removed so it must exist.")
            .downcast_mut::<Queue<V>>()
            .expect("Ensured by the key type.")
            .push_back(value)
    }

    pub fn recv<V: 'static>(&mut self, queue: QueueId<V>) -> Option<V> {
        self.queues
            .get_mut(&queue.id)
            .expect("Queues cannot be removed so it must exist.")
            .downcast_mut::<Queue<V>>()
            .expect("Ensured by the key type.")
            .pop_front()
    }

    pub fn len<V: 'static>(&mut self, queue: QueueId<V>) -> usize {
        self.queues
            .get(&queue.id)
            .expect("Queues cannot be removed so it must exist.")
            .downcast_ref::<Queue<V>>()
            .expect("Ensured by the key type.")
            .len()
    }
}

I omit the implementation of Queue itself. In my library, this is a simple wrapper over VecDeque with an added concept of limited capacity. Notice I use a separate QueueId type, which is similar to Key but works only with queues.

Event Entries

Although I want each component to have its own event type, the scheduled events still must be stored together. Furthermore, they need to be accessed in order of their scheduled time. I will use a binary heap for this. Thus, I define the following entry struct.

#[derive(Debug)]
pub struct EventEntry {
    time: Reverse<Duration>,
    component: usize,
    inner: Box<dyn Any>,
}

Notice that because the event type is erased, we store the raw component ID, which will be used to fetch the component. Of course, to use it in a binary heap, we need to also implement the Ord trait to define a total order over time.

Components

Components are the part of the simulation provided by the user, and there should be no constraint on how many we can define. Therefore, we supply a trait that must be implemented for each component.

pub trait Component {
    type Event: 'static;

    fn process_event(
        &self,
        self_id: ComponentId<Self::Event>,
        event: &Self::Event,
        scheduler: &mut Scheduler,
        state: &mut State,
    );
}

A component has an associated type Event and one associated function: process_event. It takes the following arguments:

its own ID: this is because the ID is unknown at the moment of construction (it's assigned by the simulation, recall),
the event,
the scheduler (we'll get back to this shortly),
and the state.

With this interface though, we'll hit a little snag. Recall that the event type is erased when pushed to the heap as EventEntry. Therefore, the scheduler won't know the correct type to downcast to. We could try casting to different types, but this would require us to know all possible event types beforehand. Instead, we will use a little trick:

pub struct EventEntryTyped<'e, E> {
    pub time: Duration,
    pub component_id: ComponentId<E>,
    pub component_idx: usize,
    pub event: &'e E,
}

impl EventEntry {
    #[must_use]
    fn downcast<E: 'static>(&self) -> Option<EventEntryTyped<'_, E>> {
        self.inner.downcast_ref::<E>().map(|event| EventEntryTyped {
            time: self.time.0,
            component_id: ComponentId::new(self.component),
            component_idx: self.component,
            event,
        })
    }
}

trait ProcessEventEntry {
    fn process_event_entry(
        &mut self,
        entry: EventEntry,
        scheduler: &mut Scheduler,
        state: &mut State,
    );
}

impl<E, C> ProcessEventEntry for C
where
    E: 'static,
    C: Component<Event = E>,
{
    fn process_event_entry(
        &mut self,
        entry: EventEntry,
        scheduler: &mut Scheduler,
        state: &mut State,
    ) {
        let entry = entry
            .downcast::<E>()
            .expect("Failed to downcast event entry.");
        self.process_event(entry.component_id, entry.event, scheduler, state);
    }
}

We defined another trait called ProcessEventEntry, which takes the erased EventEntry. Then, we provide a blanket implementation for any defined component that will correctly downcast the event type. This way, a user defining a new component can implement a function that operates on concrete types, and the ugly details are generated for them. Plus, the casting code cannot be overridden (and potentially broken) because the trait is not exposed to the user.

Additionally, I define a separate container for components:

pub struct Components {
    components: Vec<Box<dyn std::any::Any>>,
}

impl Components {
    #[must_use]
    pub fn add_component<E: 'static, C: Component<Event = E> + 'static>(
        &mut self,
        component: C,
    ) -> ComponentId<E> {
        let id = generate_next_id();
        let component: Box<dyn ProcessEventEntry> = Box::new(component);
        self.components.push(Box::new(component));
        ComponentId::new(id)
    }

    pub fn process_event_entry(
        &self,
        entry: EventEntry,
        scheduler: &mut Scheduler,
        state: &mut State,
    ) {
        self.components[entry.component_idx()]
            .downcast_ref::<Box<dyn ProcessEventEntry>>()
            .expect("Only objects implementing  ProcessEventEntry are ever stored.")
            .process_event_entry(entry, scheduler, state);
    }
}

Scheduler

The scheduler adds, retrieves, and executes events. It also keeps track of the simulation clock. Let's start with the latter.

type Clock = Rc<Cell<Duration>>;

pub struct ClockRef {
    clock: Clock,
}

impl ClockRef {
    #[must_use]
    pub fn time(&self) -> Duration {
        self.clock.get()
    }
}

I define a clock as a reference counting pointer. It is owned by the scheduler itself, which has mutable access to it for when it must modify the current time. For any other parts of the simulation, ClockRef only exposes immutable access to the current time. Now, we are ready to define the scheduler.

pub struct Scheduler {
    events: std::containers::BinaryHeap<EventEntry>,
    clock: Clock,
}

impl Scheduler {
    pub fn schedule<E: 'static>(
        &mut self,
        time: Duration,
        component: ComponentId<E>,
        event: E,
    ) {
        let time = self.time() + time;
        self.events.push(EventEntry::new(time, component, event));
    }

    #[must_use]
    pub fn time(&self) -> Duration {
        self.clock.get()
    }

    #[must_use]
    pub fn clock(&self) -> ClockRef {
        ClockRef {
            clock: Rc::clone(&self.clock),
        }
    }

    pub fn pop(&mut self) -> Option<EventEntry> {
        if let Some(event) = self.events.pop() {
            self.clock.replace(event.time.0);
            Some(event)
        } else {
            None
        }
    }
}

Simulation

We now have all the necessary parts and we can put them together.

pub struct Simulation {
    pub state: State,
    pub scheduler: Scheduler,
    pub components: Components,
}

impl Simulation {
    pub fn step(&mut self) -> bool {
        if let Some(event) = self.scheduler.pop() {
            self.components
                .process_event_entry(event, &mut self.scheduler, &mut self.state);
            true
        } else {
            false
        }
    }
}

This is a minimal example. The library implements more user-facing functions directly in the simulation to bypass explicitly accessing state, scheduler, and components, but that's not very interesting. The point is we have everything we need to start building simulations.

Example

Let's briefly go through some component example. Let's say we have one producer, which produces some values, and a consumer, which uses them. The producer will only react to one type of event, at which it will send a value to the producer and schedule itself at some random time interval. The consumer, however, takes some time processing the value. Thus, there will be two events: Received, which indicates that a value was sent by the producer, and Finished, which indicates that the processing has finished.

struct Product;

struct Producer {
    outgoing: QueueId<Product>,
}

struct Consumer {
    incoming: QueueId<Product>,
    working_on: Key<Option<Product>>,
}

struct ProducerEvent;

enum ConsumerEvent {
    Received,
    Finished,
}

impl Producer {
    fn produce() -> Product { ... }
    fn interval() -> std::time::Duration { ... }
}

impl Consumer {
    fn produce() -> Product { ... }
    fn interval() -> std::time::Duration { ... }
}

impl Component for Producer {
    type Event = ProducerEvent;

    fn process_event(
        &self,
        self_id: ComponentId<Self::Event>,
        _event: &Self::Event,
        scheduler: &mut Scheduler,
        state: &mut State,
    ) {
        state.send(self.outgoing, self.produce());
        scheduler.schedule(self.interval(), self_id, Self::Event);
    }
}

impl Component for Consumer {
    type Event = ConsumerEvent;

    fn process_event(
        &self,
        self_id: ComponentId<Self::Event>,
        event: &Self::Event,
        scheduler: &mut Scheduler,
        state: &mut State,
    ) {
        let busy = state.get(self.working_on).is_none();
        match event {
            ConsumerEvent::Received => {
                if busy {
                    if let Some(product) = state.recv(self.incoming) {
                        state
                            .get_mut(self.working_on)
                            .map(|w| *w = Some(product));
                        scheduler.schedule(
                            self.interval(),
                            self_id,
                            ConsumerEvent::Finished
                        );
                    }
                }
            }
            ConsumerEvent::Finished => {
                let product = state
                    .get_mut(self.working_on)
                    .unwrap()
                    .take()
                    .unwrap();
                self.log(product);
                if state.len(self.incoming) > 0 {
                        scheduler.schedule(
                            Duration::default(),
                            self_id,
                            ConsumerEvent::Received
                        );
                }
            }
        }
    }
}

Closing Remarks

In this article, I discussed my implementation of a discrete event simulation. Although the concepts are quite simple and I'm sure I haven't addressed all the issues that will come up in the future, I think there are some solid ideas here. If nothing else, it is a nice showcase of the power and expressiveness of Rust.

There are some trade-offs I made when it comes to performance. Type erasure has its cost, but it provides great flexibility. I have previously experimented with some macros, but they have their own trade-offs, and the relative simplicity of the current solution prevailed. Furthermore, you might have noticed that despite insisting that the implementation is type safe, I still needed to unwrap options with expect, which has some time overhead. I could have used unsafe in several places, but that would make debugging potential bugs much more difficult. I haven't run any benchmarks so far, which I might do at some point if my experiments end up being slow.

If you have any comments, make sure to reach out.

DEV Community