Luis Eduardo Colon for Fauna, Inc.

Posted on Jan 9, 2023 • Edited on Jan 12, 2023

Modernizing from PostgreSQL to Serverless with Fauna Part 1 (2023 Update)

#devops #productivity #testing #softwaredevelopment

With modern development practices and efficiencies in mind, you can transition from a traditional relational application to a serverless document-relational database with maximum scale and no data consistency compromises.

Introduction

Many web and mobile applications continue to be implemented atop traditional relational databases. Although many core components of these databases are five decades old, these designs are the most familiar to most database professionals. However, recently the choices have shifted toward more distributed, cloud-native options to support hyper-scale expectations.

One of the popular relational database options is PostgreSQL (A.K.A. Postgres), which dates back to UC Berkeley’s POSTGRES project in 1986. It has supported SQL since 1994 and has been ACID compliant since 2001. Like other popular relational databases, Postgres is a client-server database at its core and continues to be enhanced by an active community, with Version 15 released in late 2022.

Over its long history, Postgres has added a large set of extensions and options, including almost a dozen different data indexing options and many server replication alternatives. Multiple vendors have added data distribution options atop the core server code. In contrast, other vendors offer so-called “wire-level” compatibility atop redesigned engines that look more like less traditional key-value stores. However, most of the newest implementations still necessarily expose significant operational complexity in areas like partitioning complexity.

Some of the largest websites, driven by the requirements of serving billions of users, have led to the maturation of new database architecture alternatives, which motivate modernizing applications to leverage cloud-native, serverless, highly-distributed database options. Fauna strengthens relational data consistency among these newer alternatives while leveraging a cloud-native, serverless, flexible document model that abstracts most operational complexities from application developers.

You may be an application developer looking to build an application designed to scale well from the beginning. Or, you are a database professional looking to transition a Postgres application to minimize the amount of scaling impediments you may have encountered periodically as you continue building features. In either case, this article series is for you. This first of three articles is an updated version of an existing tutorial built by Brecht De Rooms some time ago. It explores the process of transitioning an existing Postgres application to Fauna while ensuring that you build your database in a scalable way from the beginning.

In this first part, we’ll provide a deep comparison between Fauna and Postgres while we explore how to implement and query a one-to-many relationship in a basic database. In part two, we extend the concepts to cover a many-to-many relationship and how to build a domain-specific language that avoids object-relational impedance mismatches. We explore advanced transactions with referential integrity in part three and suggest various modeling, optimization, and migration strategies.

Comparing Postgres and Fauna

Given that both Postgres and Fauna support relational database features, they share many important characteristics. Both support data relationships, serializable transactions, normalization, foreign keys, indexes, constraints, stored procedures, and many other typical relational database features. Both support ACID transactions and strict isolation levels, with Fauna default isolation levels can be considered stricter since it provides those guarantees even when data is distributed. Both are primarily aimed and are well suited for online transaction processing (OLTP) applications. Fauna focuses more on this use case because it is designed as a cloud-native database optimized for sub-second queries. At the same time, many Postgres implementations and extensions can be suited for more analytical use cases.

Notable Differences

Although both databases can implement most database requirements, their implementation varies significantly in some areas:

Relational-model vs multi-model: Postgres is often described as an object-relational database with additional data types that support documents like JSON and JSONB, but it is primarily built and used as a traditional relational database. Fauna is often described as a document-relational database with JSON documents at its core. In contrast to other common document databases, it offers relational options and flexible indexing to maintain consistency. Further, it adds temporality and graph-like features such as graph traversal.
Schemas: Both Postgres and Fauna have a concept of logical databases that include schema records describing the collections, indexes, and security properties of the included records. However, Postgres enforces schemas while Fauna is considered schema-less in that it supports adding additional fields or attributes, while still providing mechanisms to enforce integrity and uniqueness.
Database connections: Postgres expects persistent connections, like many similar relational databases. Most implementations must configure connection pools and have to be concerned with connection overhead and limits. Fauna uses stateless, secure HTTP connections and requires no connection management.
Database as an API: Postgres is built to self-host or to be hosted or managed by cloud providers, requiring managing server instances, read replicas, partitioning, and the like. There are vendors that claim that they provide serverless Postgres, but in many cases, there are still provisioning decisions to consider. On the other hand, Fauna is delivered as an API with a single global endpoint, and there are no clusters, partitions, and replication to configure. Furthermore, the single global endpoint can intelligently route a request to the closest copy of the data.
Distribution: When it comes to distribution in a traditional database, asynchronous replication is the most popular form of distribution which introduces eventual consistency and potential data loss. If consistency is required, synchronous replication is provided as an option, but it typically comes at a high price in terms of performance, especially if distribution across regions is desired. Although traditional databases like Postgres were not built with distribution in mind, many recent improvements and additions provide multiple options that supplement the non-distributed nature of its core engine. In contrast, Fauna is built from the ground up as a scalable, multi-region distributed database. It is inspired by the Calvin algorithm that speeds up data consensus by relying on deterministic calculations. Further, this distribution of data is transparent to the application developers and can accommodate data locality restrictions.
Multi-tenancy: Although Postgres provides various options to create a SaaS application where you can have multiple tenants (whether across servers or within a single database), Fauna has a unique concept of child databases with arbitrarily deep nesting. Child databases are completely isolated from each other, with the ability to create separate permissions for child databases, and the child databases cannot determine whether there’s a parent database.
Query Languages: The primary language used for Postgres is SQL or, more accurately, a SQL dialect. Although SQL has been around for many decades and is well known, documented, and standardized to the extent that it can be, every SQL database exhibits significant variations in the language. For example, to support JSON manipulation and event streaming, you will encounter significant differences between the SQL commands used in Postgres and other SQL databases like MySQL. Application developers will likely use an object-relational mapper (ORM) that adds some application complexity without removing the need to understand the Postgres-specific SQL conventions. GraphQL can be used with Postgres by adding extensions, of which there are many options to consider. Fauna provides its own native query language, FQL, which is designed to align with modern coding paradigms. Because of the nature of FQL and other design considerations, Fauna requires no ORM and is not susceptible to injection. The choice for a custom language is rooted in Fauna's scalable and distributed design. It’s designed to prevent long-running transactions, maximize performance in distributed scenarios, be highly composable to reduce round-trips, and have a transparent and predictable query plan. Many of these characteristics will become evident in this article and its subsequent parts as we cover FQL extensively in the coming sections. Beyond FQL, Fauna also supports GraphQL out of the box.

Postgres and Fauna: Terminology Mapping

The following table summarizes how Fauna concepts relate to Postgres counterparts and, in most cases, also apply to similar terms used with other relational databases. This table comes from this blog, and the Fauna documentation provides more details on these concepts and many other common SQL commands and concepts here.

POSTGRES	FAUNA	DETAILS
Row or record	Document	In Postgres, a record (or row) represents a distinct database entry, and it must conform to the containing table’s column definitions. In Fauna, a document represents a nested structure of fields and their values, with no specified structure or types. Each document, even in the same collection, can have its own independent structure, making the documents schemaless. Also, each document is versioned, storing the history of a document’s mutations from creation to deletion.
Table	Collection	Tables store records, collections store documents. In Postgres, a table’s column definition specifies the structure for all records in the table. The column definition specifies the names and types of values that can be stored in a column. In Fauna, collections are a container for documents, imposing no specific structure on those documents.
Primary	Region	Fauna has no primary or secondary concept, all regions can serve reads and writes.
Secondary, standby, replica	Region	Fauna has no primary or secondary concept, all regions can serve reads and writes.
Replication	Replication	Fauna’s replication is semi-synchronous and does not require any operator management.
Sharding, partitioning	Not Applicable	Fauna does not require the operator to manage sharding or partitioning in any way.
Primary Key	Reference (Ref)	The unique identifier of a document.
Foreign Key	Reference (Ref)	A pointer from one document to another.
Index, materialized view	Index	Fauna merges the concepts of indexes and views. Indexes must be explicitly referenced.
Transaction	Transaction	Both Postgres and Fauna support ACID transactions.
Schema, database	Database	Both Postgres and Fauna have a concept of logical databases that include schema records describing the collections, indexes, and security properties of the included records. In Postgres and other relational databases, a schema refers to the set of table definitions and constraints defined in that database. The schema is enforced so that no row violates its table definition or constraints. In Fauna, database definitions also include child databases with arbitrarily deep nesting. A Fauna database contains schemaless documents with no schema enforcement available at the document level; however, validation functions may be used when necessary. The Fauna GraphQL API does apply schema enforcement to comply with GraphQL schemas.
Stored procedures, user-defined functions	User Defined Functions (UDF) or Functions	Fauna supports user-defined functions written in FQL.

Tables and rows are collections and documents in Fauna just like in other document databases. An index in Fauna combines an index and a view as we know them in a traditional database. We’ll directly query indexes for data since indexes contain data, similar to consistent ordered views. User Defined Functions (UDFs) are similar to Stored Procedures except that, in contrast to Postgres, both queries and UDFs are written in the same language in Fauna, while in Postgres, you would split to the PL/pgSQL language for a stored procedure. Fauna users typically use UDFs much more frequently due to the easy transition from a query to a UDF.

From Postgres to Fauna: Building the Basics

In this section, we’ll learn how to create a database, a collection, insert some documents, and query them. We will focus on modeling a one-to-many relation in this article; we will model a many-to-many relation in the next article and cover other modeling options in the third and last installment. We’ll use a well-known Postgres tutorial that provides a database model of a DVD rental business. The model of the DVD rental application looks as follows:

Fig. 1.Entity relationship diagram for DVD rental business. Source: PostgreSQL Tutorial

The database schema has “film" at its center and allows customers to rent DVDs from different stores around the country, supports shopping baskets, as well as film, payment, store, and customer management. Thanks to the normalized model, it supports many access patterns: customers can find films by category, rating, release year, title, description, or actor. The staff can retrieve where a film currently resides or retrieve a list of overdue rentals.

Creating a new database

In this section, we’ll create the database. Even if you don’t follow along, this 2-click process (plus typing a name) shows how different the database-creation process is compared to setting up a traditional database.

Sign Up

Fauna is a cloud database. There is no setup of hardware or configuration necessary. All you need to do is go to dashboard.fauna.com and sign up for an account.

Create the database

Click NEW DATABASE to create a new database which will instantly be created.

Fill in a name for your database and click CREATE.

Now that you have created a database, you can follow along by pasting the provided code in the dashboard shell (or play around with the UI instead to create collections/indexes etc). Of course, you can also send queries to Fauna using one of the drivers or the terminal shell. The shell is built around the JavaScript driver behind the scenes. When writing FQL, we are not writing strings but rather functions, although that might not be obvious as we use the dashboard shell.

One-to-Many relations

Let’s start with a small and easy subset of the model and extend it gradually. The film and language relation is a many-to-one relation that should be very easy to model. Since the model remains quite simple, we’ll add a few things now and then in green to render it more interesting. Instead of one language, we’ll add both a spoken and a subtitle language.

Fig. 2.Adding a one-to-many relationship for both spoken and subtitle language.

Creating the language collection

In Postgres, we have tables that contain rows; in Fauna, we have collections that contain documents. To create a new film collection, we could use the dashboard interface but just like in SQL, manipulation and creation of collections, indexes, security roles, or even databases can be done entirely in the query language. To create the film collection, we can also paste the following code snippet in the dashboard shell:

CreateCollection({name: “language”})

Creating a language document

We can create the equivalent document for the language as follows:

Create(
   Collection("language"), 
   {
       data: {
          name: "English"
       }
   }
 )

The Create() function is used to create documents, similar to an Insert statement in Postgres. It takes the collection where you want to store the document as the first parameter and a JSON object as the second. Application data is always nested under the data key, which separates it from special Fauna fields. Once we have executed the Create() statement, we can take a look at the document by going to the Collections tab in the dashboard.

Fauna has automatically generated two fields: ref and ts, which are the Fauna equivalent of id and last_update.

ref: a unique reference to the document. The presence of references doesn’t mean that we can no longer use other IDs in Fauna. We’ll see the difference between native Fauna references and IDs shortly when we start querying.
ts: the document's timestamp, which is automatically updated when the document is adapted. Fauna’s timestamp is part of temporality features that provide time travel and support the streaming capabilities.

Note: Your reference IDs will be different. If you are following along, you can grab the reference ID of any document from the dashboard's Collections view.

Creating the film documents

To store the film documents, we need a new collection:

CreateCollection({name: "film"})

Let’s create a couple of simplified film documents. We actually have many potential choices for the relation between films and languages (at least five) since Fauna is both a document and relational database. However, anything that is not normalized is typically an optimization so let’s start with the normalized model.

We will make it more interesting and add both spoken language and subtitles language. We will refer to the previously created language by storing the native reference in the document as follows (again, your reference IDs will be different):

Create(
    Collection("film"), 
    {
       data: {
        title: "Academy Dinosaur",
        language: {
            spoken: Ref(Collection("language"), "288878259769180673"),
            subtitles: Ref(Collection("language"), "288878259769180673")
        }
      }
    }
  )

A second document might be:

Create(
    Collection("film"), 
    {
       data: {
        title: “Back to the Future",
        language: {
            spoken: Ref(Collection("language"), "288878259769180673"),
            subtitles: Ref(Collection("language"), "288878259769180673")
        }
      }
    }
  )

An alternative approach is to add the languages as embedded objects. If we do not intend to efficiently query languages directly, we could add it as an embedded object.

Create(
    Collection("film"),
    {
        data: {
            title: "Academy Dinosaur",
            language: {
                spoken: { name: "English" },
                subtitles: { name: "English" }
            }
        }
    }
)

Embedding vs normalized data

Fauna is a document database that might make us think: “don’t join data, duplicate it” as a workaround due to the limitations of popular document databases. Since Fauna is relational, such workarounds are not necessary but rather another option in your tool belt. This is an important consideration when working with Fauna versus other document databases: denormalization is a choice, not a workaround.

Postgres is not natively a document database but has, among its options, a JSONB type that can be used to emulate documents. However, such columns do not benefit from optimizations such as column statistics, are restricted to primitive operations, lack built-in functions to modify values, and may be awkward to query in SQL. The absence of column statistics blinds the query planner, resulting in potential production issues. In light of these limitations, you might resort to storing frequently queried attributes in regular columns and the remainder in JSONB columns.

In contrast, Fauna's querying and indexing capabilities do not change based on whether documents contain nested data or normalized. That doesn’t mean that denormalization is the recommended practice, but it does become more attractive as a technique to optimize read performance. We'll dive into some advanced optimization techniques at the end of this series and show how FQL can help you hit the sweet spot between flexibility, optimization, and data correction.

In some databases, indexing flexibility might suffer when nesting objects. However, Fauna’s indexing is built to be flexible regardless of whether the data is stored as nested objects or normalized. Further, Fauna indexes work equally well on nested values or arrays. For example, we could index the spoken language by adding the following path data > language > spoken > name to the index. If spoken languages had been an array instead:

{
        data: {
            title: "Academy Dinosaur",
            language: {
                spoken: [{ name: "English" }, { name: "French" }],
                subtitles: { name: "English" }
            }
        }
    }

We can add the same path to the index, and Fauna will recognize that it’s an array and unroll it. By using advanced features like index bindings, we can even combine both the spoken and subtitles language in one index to find any film in a specific language, regardless of whether it’s the spoken or subtitle language.

Before we talk about the trade-offs and even more alternative strategies, let’s see how we would retrieve films and then perform a simple join by including languages.

Querying a collection

The simplest Postgres query is easy since SQL is a declarative language where you describe what you need instead of how you want it to be calculated.

SELECT id FROM film

In contrast, FQL is a procedural language, like PL/pgSQL. Although it’s initially verbose, we’ll discover that we can extend the language to tame that verbosity.

This article focuses on replicating a traditional relational model in Fauna, providing an opportunity to introduce you to FQL. If you want to dive deeper into the fundamentals of FQL, here is an excellent guide that starts from scratch, and here's a guide to translating basic SQL queries to FQL.

Retrieving all documents starts by executing the Documents() function on a collection. The Collection() function returns the reference to the collection, and Documents() retrieves the film references from that collection.

Documents(Collection("film"))

When executing this first query snippet, you might be surprised that it simply returns something like “ok, here is a reference to a set of films” but not the actual film documents yet.

{
    "@set": {
        documents: Collection("film")
    }
}

Just like SQL, FQL is inspired by relational algebra. In FQL, we construct sets. For example, we could combine the above statement with other sets using familiar functions such as Union(), Difference(), Distinct(). A set is merely a definition of the data we would like to retrieve but has yet to be a concrete dataset.

Although it might seem like strings if we use the dashboard shell, we are actually using functions from the underlying JavaScript driver. Try misspelling a function in the dashboard shell, and you’ll get a familiar <function> undefined error. Or write:

var a = Documents(Collection("film"))
a

Make sure to copy the whole statement simultaneously, as the dashboard shell does not maintain variables between runs.

That JavaScript variable now contains our query definition, and we can use the variable to continue composing our query in a much more elegant way than string concatenation could allow. Many users use this to extend the language or construct their own DSL, which we will show extensively in the rest of the article.

Pagination

To materialize a set and retrieve the data, calling the Paginate() function is mandatory; this is an important measure to ensure scalability and to always keep transactions relatively small. This is a common best practice when consuming most interactive APIs. Since we added two films, we can add a size parameter to see the pagination in action. Fauna will return an after cursor to move to the next page.

Paginate(Documents(Collection("film")), {size: 1})

Which now returns a Page of film references:

{
  {
    after: [Ref(Collection("film"), "288877928042725889")],
    data: [Ref(Collection("film"), "288801457307648519")]
  }
}

Once we call Pagination, our set is transformed to a Page. The data types documentation lists which functions can be called on Sets, Pages, Arrays, etc.

We can then copy the after cursor to get to the next page.

Paginate(Documents(Collection("film")), {
size: 1, 
after: [Ref(Collection("film"), "288877928042725889")]
})

Data architects who have experimented with multiple ways of pagination within Postgres may want to know what kind of pagination this is. Fauna’s approach is close to the KeySet approach and cleverly takes advantage of snapshots to ensure that pages do not change when data is adapted. This is possible since everything we query in Fauna is backed by a sorted index. Just like Paginate, indexes are mandatory to avoid issuing underperforming queries.

Although we don’t seem to be using an index in the query above, Documents() is actually using a built-in index sorted by reference. By including pagination, the query above is actually more similar to the following query but with a superior form of pagination since the default page size in Fauna is 64:

SELECT * FROM film
LIMIT 64

Function Composition

In the previous queries, we only returned document references. To transform these references into the complete document data, making it more equivalent to a SELECT *, we will loop over these references with the Map() FQL function and call Get() on each of them.

Map(
    Paginate(Documents(Collection("film"))),
    Lambda(["ref"], Get(Var("ref")))
)

Besides Map() and Get(), we introduced Lambda() and Var() in the code snippet above. Lambda() is the FQL name for a simple anonymous function, which we can pass to Map to be executed on each element. Var() is used to retrieve an FQL variable (in this case, the parameter passed to Lambda). At this point, you may be wondering: can’t we retrieve the complete document directly instead of using Map()? We can do so by adding more values to the index, and we’ll extensively address the trade-offs when we talk about optimizations in the third article.

Why use the Var() and Lambda() syntax here? As mentioned before, we are composing the query by calling JavaScript functions. Using strings like “ref” as variables, and retrieving them with Var(), helps keep JS variables separate from FQL variables. The advantage of writing your query by composing JavaScript functions lies in the extensibility. For example, we can extend FQL by writing a simple function equivalent to a simple SELECT * in SQL.

function SelectAll(name) {
    return Map(
        Paginate(Documents(Collection(name))),
        Lambda(["ref"], Get(Var("ref")))
    ) 
}

With this function definition, we would then call it as:

SelectAll(“film")

This composability allows you to use Fauna in ways that would be infeasible or incredibly hard in other databases, as we’ll see later when we cover optimization strategies. A prime example is the Fauna GraphQL query which compiles one-to-one to FQL, maintaining the same database guarantees. This is relatively easy in Fauna but requires advanced techniques in traditional databases.

Querying a one-to-many relation: multiple options

Whether we decide to embed the additional data in a collection document or use explicit references, we have multiple options to retrieve the data related in our one-to-many relation.

Querying embedded documents

If we had embedded the language document by storing the languages directly in the film document, we would not have to change the query, simply returning the film documents would include the languages.

Map(
    Paginate(Documents(Collection("film"))),
    Lambda(["ref"], Get(Var("ref")))
)

Retrieving the film document with Get() would immediately return the complete document, including the languages.

{
    ref: Ref(Collection("film"), "289321621957640705"),
    ts: 1612177450050000,
    data: {
      title: "Academy Dinosaur",
      language: {
        spoken: {
          name: "English"
        },
        subtitles: {
          name: "English"
        }
      }
    }
  }

Querying normalized data with native references

Earlier, we chose to store native references instead of embedding the documents, like we depict below:

{
    data: {
        title: "Academy Dinosaur",
        language: {
            spoken: Ref(Collection("language"), "288878259769180673"),
            subtitles: Ref(Collection("language"), "288878259769180673")
        }
    }
}

This means that we need to write the equivalent of a Postgres join. In Postgres, that would look as follows:

SELECT * FROM film 
JOIN "language" as spol ON spol.language_id = film.spoken_language_id 
JOIN "language" as subl ON subl.language_id = film.subtitles_language_id 
LIMIT 64

In Postgres, we define the join we would like to see and rely on the query optimizer to select the right algorithm. If the query optimizer makes a wrong judgment, query performance can suffer significantly. Depending on the data and the way we join it, the join algorithm could differ and even change when the size of the data changes.

Due to the scalable nature of Fauna, we want predictability in terms of price and performance. We have talked about set functions such as Union(), Difference(), and Distinct(), and we can accomplish this without using a Join() statement in Fauna yet. Although joins can scale in general, their performance will depend on many factors. To ensure predictability, we’ll join on the materialized documents after pagination. We’ll retrieve the film document within the lambda and paginate on each level, as we’ll see when we tackle many-to-many joins.

Let’s continue incrementally building upon the previous query, which returns our film documents with language references. We’ll start here, as before:

Map(
    Paginate(Documents(Collection("film"))),
    Lambda(["ref"], Get(Var("ref")))
)

First, we'll slightly refactor it to bind variables with Let(), bringing more structure to our queries and rendering them more readable. Within a Let(), we can retrieve anything related to a film we desire. Here’s our next incremental iteration:

Map(
    Paginate(Documents(Collection("film"))),
    Lambda(["filmRef"], 
        Let({
           film: Get(Var("filmRef"))
        }, 
        // for now, we’ll return the film variable.
        Var("film")
    ))
)

Next, we retrieve both language references from the film document with Select() and get the actual languages by dereferencing the spoken and subtitles language reference with Get().

Map(
    Paginate(Documents(Collection("film"))),
    Lambda(["filmRef"],
        Let({
            film: Get(Var("filmRef")),
            spokenLang: Get(Select(['data', 'language', 'spoken'], Var("film"))),
            subLang: Get(Select(['data', 'language', 'subtitles'], Var("film")))
        },
        // todo return
        )
    )
)

And finally, return these variables in the structure we specify. We can now try this complete function in the shell:

Map(
    Paginate(Documents(Collection("film"))),
    Lambda(["filmRef"],
        Let({
            film: Get(Var("filmRef")),
            spokenLang: Get(Select(['data', 'language', 'spoken'], Var("film"))),
            subLang: Get(Select(['data', 'language', 'subtitles'], Var("film")))
        },      
        // return a JSON object
        {
            film: Var("film"),
            language: {
                spoken: Var("spokenLang"),
                subtitles: Var("subLang")
            }
        })
    )
)

The complete result is now nicely structured, with both films and both languages.

{
  data: [
    {
      film: {
        ref: Ref(Collection("film"), "352445336554307667"),
        ts: 1672376915430000,
        data: {
          title: "Academy Dinosaur",
          language: {
            spoken: Ref(Collection("language"), "352444818516869204"),
            subtitles: Ref(Collection("language"), "352444818516869204")
          }
        }
      },
      language: {
        spoken: {
          ref: Ref(Collection("language"), "352444818516869204"),
          ts: 1672376421400000,
          data: {
            name: "English"
          }
        },
        subtitles: {
          ref: Ref(Collection("language"), "352444818516869204"),
          ts: 1672376421400000,
          data: {
            name: "English"
          }
        }
      }
    },
    {
      film: {
        ref: Ref(Collection("film"), "352447504810246229"),
        ts: 1672378983250000,
        data: {
          title: "Back to the Future",
          language: {
            spoken: Ref(Collection("language"), "352444818516869204"),
            subtitles: Ref(Collection("language"), "352444818516869204")
          }
        }
      },
      language: {
        spoken: {
          ref: Ref(Collection("language"), "352444818516869204"),
          ts: 1672376421400000,
          data: {
            name: "English"
          }
        },
        subtitles: {
          ref: Ref(Collection("language"), "352444818516869204"),
          ts: 1672376421400000,
          data: {
            name: "English"
          }
        }
      }
    }
  ]
}

Querying normalized data with user-defined primary keys

In the above example, we had access to language references in the film document. Therefore, we could directly use Get() to dereference these references. We could have chosen user-defined primary keys instead of native Fauna references like this:

Create(
    Collection("film"),
    {
        data: {
            title: "Academy Dinosaur",
            language: {
                spoken: 6,
                subtitles: 6
            }
        }
    }
)

and

Create(
    Collection("language"), 
    {
       data: {
          id: 6,
          name: "English"
       }
    }
  )

Fauna is still able to retrieve the languages, but it would require an extra step with an index. For the sake of comparison, let’s implement the same query again with this use case. First, we need an index to retrieve the languages by id:

CreateIndex({
    name: "language_by_id",
    source: Collection("language"),
    terms: [
      {
        field: ["data", "id"]
      }
    ],
    values: [
      {
        field: ["ref"]
      }
    ]
})

The main ingredients of indexes in Fauna are terms and values. Terms determine what the index matches on, while values determine what it returns (and in what order). Since indexes return values in Fauna, they are a mix of a view and an index as we know them in Postgres. In fact, we can significantly optimize our queries with indexes once we know our application’s data access patterns, as we’ll explain later. In this case, we’ll start in a generic fashion and only return the language reference from the index.

We’ll call the index with the Match() function. As long as we only need the first result from Match(), we can call Get() on the match. We will see how to handle multiple results later on.

Map(
    Paginate(Documents(Collection("film"))),
    Lambda(["filmRef"],
        Let({
            film: Get(Var("filmRef")),
            spokenLangId: Select(['data', 'language', 'spoken'], Var("film")),
            subLangId: Select(['data', 'language', 'subtitles'], Var("film")),
            spokenLang: Get(Match(Index("language_by_id"), Var("spokenLangId"))),
            subLang: Get(Match(Index("language_by_id"), Var("subLangId"))),
        },
            // and return the values however you want.
        )
    )
)

At this point you may wonder, should I use native Fauna references or user-defined keys? You can choose either. If you used custom keys or IDs in the Postgres implementation you are migrating from, you may have other business logic reasons to continue using them. If you use Fauna’s native references, you may simplify your code and retrieve data more efficiently (similar to what graph databases call index-free-adjacency).

And you can set your own custom IDs within Fauna's native references during document creation:

Create(
    Ref(Collection('language'), '6'),
    { name: { title: 'English' } },
)

Summary

We covered a lot of ground here. First, we outlined the key motivations why application developers are considering moving away from traditional relational databases like Postgres. Even with Postgres’ feature maturity, scaling traditional database applications horizontally is still challenging, especially if the application is to handle very significant traffic loads. We covered key differences between Postgres and Fauna and provided a terminology mapping that should have demonstrated the equivalence of many shared concepts.

We went through the process of creating a database, building a couple of collections, and inserting documents in those collections to illustrate a typical one-to-many database relation from one collection to another. We covered using native Fauna references, as well as the alternatives of embedding and denormalizing the data, as well as using user-defined IDs, and how those decisions change how we build queries to retrieve the data in all those cases. We used FQL to incrementally compose functions that look a lot like JavaScript, illustrating the deep integration that can be achieved in your application code without requiring an ORM or avoiding embedded SQL strings that can be prone to injections.

This sets us up for part two, where we use a many-to-many relationship example to cover more modeling strategies and extend the idea of writing a domain-specific language with FQL. Finally, part three focuses on referential integrity as well as additional modeling, optimization, and migration ideas. By the time you complete this journey, you should be armed with a solid foundational strategy to transition your Postgres-driven application. This strategy will provide you with enough guidance to have a scalable and performant resulting system from the beginning. It will also set you up with a composable and testable code base that will help you maintain your code leveraging modern application best practices.

About the author

Luis Colon is a data scientist that focuses on modern database applications and best practices, as well as other serverless, cloud, and related technologies. He currently serves as a Senior Technology Evangelist at Fauna, Inc. You can reach him at @luiscolon1 on Twitter and Reddit.

DEV Community