Kirk Kirkconnell

Posted on Jun 21 • Originally published at fauna.com

Flexibility Meets Structure: Evolving Document Database Schemas with Fauna

#database #nosql #serverless #devops

The debate over utilizing a more strict schema definition and enforcement versus going schemaless with NoSQL databases often sparks passionate discussions. For the longest time, I was in the camp of “I hate the word schemaless,” when it came to NoSQL databases…and I am not someone who uses the term hate lightly. I was squarely in the “you must have a schema” camp. “Know your access patterns!” And while, ultimately, I still think you should have a schema and data model for every production app using NoSQL for it to perform well and be cost-effective, I have softened my “I hate schemaless” ideology. Why? It depends on where you and your team are in the development or application lifecycle and what kind of data you have. Early on, you may not know all your data access patterns or how data relates. Over time, that likely changes and the database schema and data model need to change with you. In addition, I have softened my stance because features in NoSQL databases evolved over the years. This is especially true recently, but more on that in a bit.

Strict schemas offer data integrity, static typing, computed fields, and predictability, which are highly valued by many but not usually associated with NoSQL databases. On the other end of the spectrum, schemaless design provides flexibility and time efficiency, allowing unstructured data to be easily added. While this can work in some cases, most apps need more structure and controls for long-term cost-effectiveness and performance, but also data integrity.

I will give you an example. I looked at a former coworker’s data model a few years ago and was surprised. He was simply dumping JSON into the database. For the app he was working on, it worked…for the moment. If he needed to scale to even a thousand or more ops/sec, he would have had problems in both performance and hard costs. I almost presented him with a better data model, but he was hesitant to change anything. Changing the data model or schema in the database on his platform would have been a major task, and that platform lacked controls to maintain data integrity, given his coworkers’ involvement. It also offered no help for migrations either.

I have heard this from developers hundreds of times in my years with NoSQL databases. “What if I get my shard key wrong?” “What if I choose the wrong partition key?” Most databases give you the freedom to design a data model but then punish you for making incorrect decisions or just needing to change things when an app design changes. “You’re on your own,” is what most databases essentially say, as they don’t make fixing the issue easy.

Fauna’s latest additions to its Schema features change all of this. It introduces document type enforcement, including field definitions and wildcard constraints, as well as zero-downtime schema migrations. These features, along with the previously released check constraints and computed fields, change how we can approach schemas and data modeling in a NoSQL document database. The beauty of this release is you now have strict schema control and enforcement tools, but you don’t have to make those potentially difficult decisions upfront. Even better is the zero-downtime migrations solve the anxiety of “did I get this data model correct.” The new features allow you to start completely schemaless and add a stricter schema and enforcement over time as your application evolves. It gives you the ability to migrate from your existing schema, or lack thereof, in a controlled, methodical, and scripted fashion to your new schema. There’s a reason why Fauna is called a document-relational database.

Anyhow, let’s jump into the release features and see exactly what’s here and why it matters.

Document types

Document types enable you to codify and enforce the shape of the data you want for a collection. Things like what fields can a document in this collection have, what values those fields can have, whether they are optional, can a document have fields not part of the required fields, and so on. To put it another way and use an example, you create a collection named Product and define what the product documents in that collection must look like structure-wise, or else non-conforming write and update operations are rejected.

Whether you stay schemaless, add some field definitions with wildcard constraints to also have ad-hoc fields in order to stay flexible, or go fully strict and only allow a finite list of fields, Fauna will enforce what you define as the schema for that collection.

Field definitions and schema enforcement

First up is field definitions. With this, you can define fields for documents in a collection as one or more data types, a reference to another document, enumerated values, or a wildcard constraint. You can even set if the listed fields in JSON documents for this collection must be present or are optional. Prior to this latest release, you could already set a unique constraint on a single field or a combination of fields.

For example:

collection Order {
  user: Ref<User>
  cart: Array<Ref<Product>>
  address: String | Ref<Address>
  name: String?
  status: "in-progress" | "completed" | "error" = "in-progress"
  *: Any
}

I define a collection named Order, and it has five fields and a wildcard constraint: 1. The User field must be present and a reference to a document in the User collection. 2. The Cart field must be present, and an array of references to documents in the Product collection. 3. The Address field must be present, but it can be either of type String or a reference to a document in the Address collection. 4. The name field is optional and can be Null, but if it is present, it must be of type String. 5. The status field is not nullable, must be of one of the enumerated values, and if not present, defaults to “in-progress.” 6. A wildcard constraint, but more on that in a shortly.

Once this schema is in place, if you try to write or update a document in the Order collection and the new document violates this structure, that transaction is rejected by the database. You could also make this collection have a strict schema where documents must have these fields and only these fields. If the document has additional fields, the transaction is rejected.

Wildcard constraints to keep some schema flexibility

Now about that wildcard constraint in the example above…

*: Any

There are three ways to think about and work with wildcard constraints: 1. If you have it along with other fields defined in a collection definition, it tells Fauna that it’s ok for incoming documents in this collection to be flexible. This means the document must adhere to the defined schema for this collection, but the wildcard constraint says you can have additional ad-hoc fields in that document. 2. If you have a collection definition and it has no field definitions, that is an implied wildcard constraint. You could put it explicitly in there, but it’s not necessary. 3. If you omit the wildcard constraint line from a collection definition with defined fields, we have a strict schema for this collection. This means the documents in the example Order collection must adhere to the schema provided, and they cannot have ad-hoc fields.

To be overly clear, with the wildcard constraint, any document in the Order collection example above can have additional fields not listed in the schema, but they are not checked by Fauna. So you get the best of both worlds here. In the same document, you get flexibility and schema control/enforcement when you need it, but still have extensibility as well.

Zero-Downtime Migrations

While the benefits of field definitions and document type are great, it’s migrations that truly tie everything together and make this work. Migrations facilitate you to seamlessly and systematically update your each collection’s schema as your needs evolve. As mentioned in my example with a former coworker, most databases do not make altering your schema easy. Even an RDBMS applies schema changes synchronously and holds locks, creating downtime. In most cases, when you make changes, you have do a ton of heavy lifting to write and test code that runs the migration outside of the database in order to read, transform, and move data to the new schema. I have written hundreds of these in my years working on databases, and they can be a major pain.

Fauna solves this with additions to the Fauna Schema Language (FSL). While FSL previously existed, now it has the ability to incorporate instructions on how to migrate your existing schema to the next iteration in a controlled fashion. FSL files can also be versioned with your existing code with tools like Git and be part of your CI/CD pipelines. Best of all, the FSL runs inside the database. No dragging data to and from a client. You transmit the instructions on how to change the schema to what you want, and Fauna takes care of all the heavy lifting.

For instance, I began the development of my app and started with a schemaless collection for a user profile in your User collection. I don’t know what the schema would look like ultimately, but now that I am a few days into this, I know a few fields that must be present in every user document going forward.

My existing collection definition in FSL looks like this, perhaps:

collection User {
  *: Any
}

Note: I added the wildcard constraint explicitly for talking purposes. If that line is omitted, the wildcard constraint is implied.

I want to make sure that every document in the User collection has a first name, a last name, and an email address, but more fields can be added if you want. Here’s what the schema definition looks like.

collection User {
  firstName: String
  lastName: String
  emailAddress: String
  *: Any
}

My dev process is progressing and I want to specify fields I know must be in the User document type, and for Fauna to enforce that. I still want to be flexible on adding more fields as needed though. If I didn’t have any data in the User collection, I could stop here. I do have data and I don’t want to delete it, so I need to do a migration. Fauna will not assume anything for migrations. You have to give it explicit instructions on what to do.

collection User {
  firstName: String
  lastName: String
  emailAddr: String
  conflicts: { *:Any }?
  *: Any

  migrations {
    add .firstName
    add .lastName
    add .emailAddr
    add .conflicts
    move_conflicts .conflicts
    backfill .emailAddr = "unknown"
    backfill .firstName = "unknown"
    backfill .lastName = "unknown"
  }
}

In the collection definition, I have the structure as I showed before, but I added a conflicts field for the migration process in the event there is a data type conflict. In the migration section, I am telling Fauna to perform an add of the four fields and to move any field with conflicting data types into the object in the conflicts field. For example, say there is one document with a value in firstName, but it is a number, not a string, as I have defined firstName to be. That is a conflict. The migration will move that field as I mentioned. The document will still have a firstName field, but it will have a value of unknown because of the next section. These backfills are because I have said in the collection definition that these fields cannot be null. So there has to be something there. In this case, I put “unknown”, but it could be whatever you want. Your application could then look for that value and, if you want to, handle it. i.e. prompt a user to fill it in with valid data.

This is a simple overview, and there is a lot more to migrations, as you can imagine.

Summary

In conclusion, the evolution of NoSQL databases and schema, particularly with Fauna’s latest release, bridges the gap between the flexibility of schemaless design and the structure of strict schemas. As a document-relational database, Fauna combines the best aspects of both document and relational schema design, offering features like field definitions, document type enforcement, and seamless migrations using Fauna Schema Language. These advancements enable developers to start with a schemaless approach and gradually incorporate structure as their application evolves. What Fauna calls “gradual typing.” This not only ensures long-term performance and cost-effectiveness but also maintains data integrity and adaptability. With these features, Fauna advances how we approach schemas and data modeling in NoSQL databases, making it easier than ever to adapt and scale your database to meet your evolving needs.

For more information about any of these topics, the documentation on collections and documents are your best resource.

DEV Community

Flexibility Meets Structure: Evolving Document Database Schemas with Fauna

Document types

Field definitions and schema enforcement

Wildcard constraints to keep some schema flexibility

Zero-Downtime Migrations

Summary

Top comments (0)

Read next

Retro on "Docker Compose for Developers"

Running pgAdmin to Manage a PostgreSQL Cluster on Kubernetes

5 Best Practices for Responsive Web Design

Day 6 of 90 Days of DevOps: Dockerizing a Node.js App