Byron Polley

Posted on May 27, 2020 • Edited on Jan 13, 2022 • Originally published at byronpolley.com

Modelling Data in FaunaDB - a primer

#fauna #database #jamstack #serverless

When beginning a project there are a multitude of technical decisions to be made. Some of these may have little to no effect on the final outcome and may be purely preferential (more on why this still matters towards the end) while others are to be taken with more care. One of these such decisions is data persistence and management. The impact of a rushed data layer has the potential to cripple an entire codebase. Many of the effects can lean towards knock-on as opposed to direct but simply put, the time it takes a developer to achieve a task rests on how intuitive the initial stack has been thought out to allow.

Depending on the database you choose the initial data modelling time varies but one platform that excels in this area is FaunaDB primarily because it blurs the lines between existing modelling paradigms.

For example the FaunaDB rendition of indexes allows them to be used as a primary key (there is an example of this further on) to do exact matches, range queries, sorting and a efficient rendition of a view that provides you with fast access to a subset of your data thereby trading storage for compute. You can even combine data from multiple collections in one index. In one way you can use them to prevent excess information similar to why you would use GraphQL and in another way this is comparative to a combination of SQL views and indexes except scalable over multiple machines.

There are a number of databases and paradigms and each has varying ways of describing functionality, management and technical terms.

Terminology Comparisons

DATABASE	FaunaDB	MySQL	MongoDB	DynamoDB	Firestore
PARADIGM	Serverless Cloud Database	Relational DBMS	NoSQL Document Store	Non-relational NoSQL Database	NoSQL Document Cloud
CONTAINER	Collection	Table	Collection	Table	Collection
RECORD	Document	Row	Document	Item	Document
QUERIES	Fauna Query Language (FQL)	Structured Query Language (SQL)	MongoDB Query Language (MQL)	Console and CLI based	Methods and Listeners

The Evolution of Data Persistence

There have been unique developments in state since some of the first databases came around. Despite this, databases seem to be the one area that stays relatively stagnant and non-innovative and for good reason. There is still a huge preference towards relational database systems as trusted and well-tested platforms to store your data. With this comes a significant reliance on traditional database systems and as such we will focus on contrasting SQL with FaunaDB as the majority of users currently use those systems according to DB-Engines.

Although still the most popular database paradigm, this is not necessarily the most adept comparison as the emergence of serverless cloud computing has altered the landscape irrevocably and to understand how data modelling is different in FaunaDB we have to understand where we are now.

NoSQL databases aim to solve the many ways in which relational databases are no longer supportive of modern day development workflows. In fact when comparing data stores optimised for these conditions - distribution and scalability - the lines of not only the database structure but also the way in which we interact with the data become blurred. FaunaDB improves upon the NoSQL approach while minimising the barrier to entry by utilising familiar syntax and terminologies. The documents are stored in enhanced JSON and as such support all basic data types so there are no new terms to learn. This is extended upon by the addition of sensible special types.

Data Types

Basic Types	Special Types
Boolean	Byte
Null	Date
Number	Page
String	Query
Literal	Ref
Array	Set
Object	Timestamp

Data types in FaunaDB are familiar territory.

The option to nest objects as deep as you need is present but the ability to normalise your data allows those familiar with SQL to construct data in a relational fashion. This is one of the many ways that FaunaDB combines the best parts of relational databases with the modern day platforms that value scale, multi-region availability and data malleability.

The addition of GraphQL is what makes the database similar to a new wave of adding state to modern applications such as MongoDB Atlas and even Hasura which combines PostgreSQL and a GraphQL API to offer relational and easily-queried data stores. However handling data in FaunaDB is unique and representative of a new tier in database evolution (serverless databases) because it is purpose built to provide low latency access, auto-scaling and multi-region distribution without sacrificing data consistency. This is made possible by the Calvin protocol that FaunaDB was inspired by which allows for ACID compliance in a distributed system. Inherently FaunaDB is designed to work effectively for relational, document, and graph models. Let's run through an example modelling scenario that contrasts the processes.

UML for FQL (Fauna Query Language)

When modelling a relational database you will often have to create a UML diagram. This is a necessary step to conceptualise the data beforehand. Represented is a small portion of a database that may exist on an online music platform.

A UML diagram often associated with MySQL but applies to relational databases in general.

This is the conventional method of designing data. Many associate this with SQL because it is ubiquitous with this model. FaunaDB is supportive of this workload however it is more flexible when scaled. Similar concepts exist such as adding an index that functions as a primary key if we are certain that every document should contain this field. In FaunaDB this can be done after creation of the initial collection while changing primary keys across tables in a traditional, relational database deployed to production would require remodelling. Feel free to reference Fauna's cheat sheet for FQL aimed at users coming from SQL.

Let's set up our basic structure using FQL.

The CreateCollection is similar to CREATE TABLE in SQL. The same for CreateIndex.

CreateCollection({name: "artists"});

CreateCollection({name: "songs"});

CreateCollection({name: "albums"});

We are emulating the primary keys of each respective table by specifying unique as true.

CreateIndex({ 
  name: "artists_by_id", 
  source: Collection("artists"), 
  terms: [{ field: [ "data", "artist_id" ] }], 
  unique: true 
})

CreateIndex({ 
  name: "songs_by_id", 
  source: Collection("songs"), 
  terms: [{ field: [ "data", "song_id" ] }], 
  unique: true 
})

CreateIndex({ 
  name: "albums_by_id", 
  source: Collection("albums"), 
  terms: [{ field: [ "data", "album_id" ] }], 
  unique: true 
})

It is also useful to note that although it is recommended to model your data beforehand, even in the most basic of forms to better understand your system, it is not absolutely necessary because FaunaDB structures data in an easily adjustable way. There is also the capability of unlimited child databases to support multi-tenancy with no cross database data pollution. For example two child databases can be created if a second record label on your system catalogues their artists in a different manner. Furthermore those labels may create additional child databases which operate in isolation. We are going to use a child database one layer deep for our example. Each label could then have a portal to manage all of their artists without remodelling all over again. The above queries could simply be executed within one of these "Record Label" child databases. The child databases are created with CreateDatabase({name: "record_label_1"}).

Suppose we wanted to add a "Genre" field that could be associated with an Artist. A musicians career may evolve and therefore span multiple genres. However a single genre can also be assigned to multiple artists. This is a common example of what SQL users know as a many-to-many relationship. We would create a collection for genres and populate it.

CreateCollection({ name: "genres" });

Foreach( ["Hip-Hop", "Electronic", "Rock"], 
  Lambda("genre", Create(Collection("genres"), { data: { name: Var("genre") } })))

Then an index is created on this collection so that we can search it using the genre name.

CreateIndex({ 
  name: "genre_by_name",
  source: Collection("genres"),
  terms: [{ field: ["data", "name"] }],
  unique: true 
})

We create a relationship between an artist with an ID of 1 and the genre "Rock". The uniqueness has been set to true on both indexes so our data will be accurate. In order to create a relationship in the other direction we could simply swap the indexes.

Create( 
  Collection("music_type"),
  { 
   data: {
     artist: Select("ref", Get(Match(Index("artists_by_id"), "1"))), 
     genre: Select("ref", Get(Match(Index("genre_by_name"), "Rock"))) 
    } 
  } 
)

Finally to get a list of all genres associated with an artist we would run the query below. Notice the Lambdaanonymous function that runs FQL code.

Map(
  Paginate( Match( Index("music_type"), Select("ref", 
  Get(Match(Index("artists_by_id"), "1"))) ) ), 
  Lambda("genre",
    Select(["data", "genre"], Get(Var("genre"))) 
  ) 
)

This example is typically found in graph modelling but here it is being used and combined with the best aspects of relational and document models.

Developer Experience (DX)

In this fashion FaunaDB simplifies the data-modelling process and goes back to how choosing a data store may be more preference or simply habit than we realise. The database adapts to your data, not vice versa. This shortens the data modelling process because the various keys, indexes and even entire databases can be defined as and when you need them. When designing traditional tables you may over-engineer certain data which is not critical and thus consume development time. It is flexible in that it can be altered after creation without complicating all areas of the modelling process and functions truly serverless. All actions are charged per usage on a completely managed infrastructure, there are no resources to manage and you do not pay for idle time whilst the introduction of FaunaDB Data Manager provides a supportive system to migrate, backup and restore data.

It is important to remember that developers are also users.

There is a large push towards the JAMStack as it allows the developer to focus on building what makes their product unique. By taking the complexity out of data modelling and allowing the most commonly used models to be constructed in a matter of minutes, FaunaDB is the first database that fits into this new tech stack because it was engineered to work in an alternate way to anything else on the market today. One of the most interesting aspects that come out of the flexibility is that developers can create their own data modelling paradigms to suit the requirements of each project. Once FQL is familiar territory we will begin to see alternate ways of modelling data that simply weren't possible using existing databases. In fact the paradigms provided are to contrast similarities but in my time using the platform it has become clear that it requires a new outlook on state management. One that is as accessible as the other tools it is being used with and one that will surely mature as time passes. (The company is 9 years young, to put this into perspective SQL was conceptualised in 1970).

Uploading of GraphQL schemas to auto generate the DB is possible coupled with the ability to mix and match required models as the data evolves. This is what brings the system into a new wave of ways to manage data and ultimately prove that the best way to model your data may become to not model it at all.

Oldest comments (5)

Aaron Mead • May 30 '20

Thanks for sharing all of this Byron, I liked that you described the "why" behind a lot of these concepts.

One thing that stuck out to me was the use of child databases and how they could each have their own schemas. That seems cool, but in practice, how can you safely build a user interface around that when you can never really know what underlying data you have access to? Perhaps it's only for a certain use case.

Byron Polley • Jun 1 '20 • Edited

Thanks for the comment.

The multi-tenancy feature is designed to be used on a project or team basis in order to intentionally split the data. So one child database can be used for your product and another for your internal ops for example. This means that each assigned team would be aware of the way they decided to design their data.

I create one database per app and then decide how it should be modelled. With regards to not knowing what underlying data, this is the purpose of indexes which can be created to query data in whatever way you deem fit :)

Hopefully this answers your questions.

Aaron Mead • Jun 2 '20

Ah okay that clears it up a bit, thanks for the explanation there

Emil • May 23 '21

Great post, what did you use to create the nice looking UML diagram?

Jim Bridger • Sep 13 '21 • Edited

So I found it; it's DrawSQL.

You may also be interested to know that the way I found it was using Yandex Image Search to find similar images. I tried a few other sites for finding similar images and they all failed, but Yandex nailed it with the first result.

Have a good day :)