Baptiste Parmantier for Mineral Foundation

Posted on Jan 18 • Edited on Feb 7

Data structure in Mineral

#dart #discord #opensource

Introduction

The data structure is at the heart of every IT project, playing a decisive role from the earliest design stages.

It defines the way in which information is organised, stored and manipulated, directly influencing the application's performance, maintainability and scalability.

So, in accordance with the Discord.js library, we chose to load all the data available for a single Discord server when accessed from any model.

Context

Basically, we wanted Mineral to be able to offer data models ready for immediate use by the end developer.

So, in accordance with the Discord.js library, we chose to load all the data available for a single Discord server from any model.

This choice implies that each time we receive an event from the Discord websocket client, we have to serialise the data that is the subject of our event, as well as everything that is accessible to it, whatever the level of depth.

As each structure had to be able to be traced back to its parent (and vice versa), we soon found ourselves having to serialise a huge amount of data for a simple structure.

An example may seem more concrete to you.

Issues

Let's say we receive the MessageCreate event from Discord's websocket client, we need to write our handler in this way.

client.events.server.messageCreate((ServerMessage message) {
  // Our code
});

Serialisation is costly

Following this premise, we need to allow access to our Discord server from our message model, to do this we chain together something like message.channel.server.

Behind the simplicity of this chain, there are a number of implications

The message must contain a complete structure of the Channel into which it was sent.
The Channel must be able to make accessible the complete structure of the Server in which it evolves.
The Server model embeds a whole set of properties including its own channels, members, roles... which, for each of them, must also make it possible to trace back to itself...

And so on...

All these structures, nested in a multitude of levels, generate enormous computational complexity and have a significant impact on the cost of serialising each data structure.

Our main concern is the difficulty of serialisation with which even the smallest data structure will be confronted.

So we increase the computational complexity, again and again.

It is important to note that a complex data structure is transmitted to the client even if it does not require any level of depth.

client.events.server.messageCreate((ServerMessage message) {
  await message.reply(content: 'Hello World');
});

In this case, even though we only need the instance of our message, at no point do we use a message.channel.server chain; and yet the server is indeed serialized and supplied to the client.

Cache dependency

Our philosophy is not to impose anything on developers.
One of our promises is not to impose the use of any caching solution.

Behind this promise, we want the framework to be able to be offered as a stable and viable solution for any project.

In this section, we will use the Discord.js library as a reference for comparison.

This library imposes the use of an in-memory caching solution, which allows it to offer complex data structures such as those described above, allowing developers to access almost any property or action available on the Discord server.

The choice of imposing a caching solution actually hides a deeper problem that introduces excessive consumption of the memory used by the application.
Conversely, not using a memory cache in the Discord.js library makes it impossible to deliver these complex data structures.

It is possible to envisage a system that introduces a notion of lifespan or life cycle of the data, but it is up to the developer to work this out.

Solution adopted

In order to overcome all the problems explained above, we decided to drastically reduce the size of our data structures to make them as minimalist and atomic as possible.

This choice has enabled us to...

reduce the dependency of each data structure on the others
reduce the size of our data structures
reduce serialisation time
reduce the overall complexity of using a resource
eliminate potential errors linked to the absence of properties required for complete serialisation of the complex data structure
eliminate dependency on a caching solution

Impact on DX

It is important to note that these profound changes will result in a modification of the development experience for the end user.

As each delivered structure is smaller, the developer will have to explicitly request the retrieval of certain resources required for his business context.

An example of this is the retrieval of a server from a Channel using a newly introduced function resolveServer().

We selected the term resolve in a very specific case where our data model contains every piece of information needed to construct an HTTP request (usually to retrieve a resource) without any additional parameters being requested from the developer.

We can now see a new way of accessing our data.

client.events.server.messageCreate((ServerMessage message) async {
  if (message.authorIsBot) {
    return;
  }

  final (channel, author) = await (
    message.resolveChannel(),
    message.resolveMember()
  ).wait;

  final str = 'Hello from ${author.username} in ${channel.name}';
  await message.reply(content: str);
});

Caching

We have already stated our desire to allow the end developer to dispense with any caching solution, but make no mistake, it is still extremely interesting and important within your applications.

Using it will drastically reduce the execution time of your code while reducing the number of requests to the Discord API, thereby reducing the risk of a rate-limit.

The current procedure is as follows.

There are 2 possible uses for it:

Without cache

When no caching solution is used, the Datastore will make a direct HTTP request to the Discord API to obtain the result, then serialise it and send it back to the consumer.

With Cache

In this case, there are two possible scenarios.

The cache has the data
The datastore contacts the caching solution, then retrieves the result, serialises it and sends it to the consumer.

The cache does not have the data
In this case, the Datastore will make an HTTP request to the Discord API to retrieve the result. The result is then pushed into the cache, serialized and sent back to the consumer.

Credits
We would like to thank Lexedia and Abitofevrything from the Nyxx team for the discussions and advice that led to this result.

See more in the documentation