DEV Community

Egor Pavlikhin
Egor Pavlikhin

Posted on

Temporal tables or JSON storage

Problem Definition

Imagine you run a small business and have a simple e-commerce system with Products, Orders and Customers: a customer places an order with a bunch of products inside. You ship the order and mark is as Done. The Customer then logs in and changes their shipping address. You go back to your owner's dashboard to look at past orders and see the new address as the shipment address, although it is clearly wrong.

I want to look at just one possible solution to this problem (in context of relational databases, specifically Postgres), which is to store all data relative to the order somewhere close to it.

Solution 1 - Add columns to Order table

First thing we could try is to denormalize the data and store everything related to an order directly in the Order table. This is quite straight forward and easy to find tools to support it with.

Alt Text

Advantages

  1. Naturally aligns with well supported table structures.
  2. Data is strongly typed and easily explorable - you always know what you will get back.
  3. Easy to query and analyze.

Drawbacks

  1. Hard to support changes to the way customer or product data is stored. Columns moved, deleted or added to the original tables might create problems in historic data.
  2. Similarly, if product is deleted it might be hard or impossible to derive analytical data about it.
  3. Only values are stored - schemas have to be versioned externally (this can be mitigated through Views).

Solution 2 - Temporal tables

Temporal tables are quite powerful, but unfortunately poorly supported in open-source database engines. There are extensions that exist for Postgres, but it's hard to say if they are production ready and there is definitely no official support.

The idea is to have a historic table next to data tables (so Customers and Customers_History) with the data table holding the current version of the data and historic tables storing all the rows that have ever existed with a START (CREATED) and END (CHANGED) dates.

It then becomes quite easy to link a row in the Orders table back to a specific Customer record in any point in time.

Alt Text

Advantages

  1. Normalized data. Everything lives where you expect it to live.
  2. You can get a snapshot of the entire data set at any point in time or exclude certain periods, which can be very powerful for analytical purposes.
  3. Audit and historic data retention is the natural purpose of temporal tables.

Drawbacks

  1. Hard to support changes to the way customer or product data is stored. Columns moved, deleted or added to the original tables might create problems in historic data.
  2. Poor or no support in Postgres - you will have to support this manually and there is no perfect option.
  3. Querying data becomes more complex.
  4. Only values are stored as historical data - schemas have to be versioned externally (this can be mitigated through Views).

Solution 3 - JSON blobs

At last we arrive at my favorite solution - storing snapshots of data as JSON. The possibilities are endless here, but essentially the proposition is to have one column per each data type (Customer, Address, Product) and store all related data in plain JSON object. JSON data can even be versioned or you can store JSON Schema next to it.

Alt Text

Advantages

  1. Postgres has excellent support for JSON data in jsonb type. It's fairly easy to query.
  2. Data can be versioned directly inside the columns.
  3. All relevant data is immediately available on the relevant row. If your front end is using JSON you can pass this data directly from DB with no processing.
  4. To simplify analytics virtual columns / views can be easily created.

Drawbacks

  1. Data in JSON blobs can be inconsistent. You will have to look after it yourself. Most likely you will need to store metadata about each version externally.
  2. Queries will be slower. Much slower. And somewhat harder.
  3. Analytics are harder and most out of the box solutions will likely not work.
  4. DB size will grow significantly, since data will be essentially duplicated for each order. And hopefully you have a lot more orders that products and customers.

Conclusion

For me the clear winner is JSON - my back-end is in NodeJS / NestJS and front-end is in Angular, so manipulating JSON data comes naturally, but I can see how the first option might be attractive to traditional applications.

What would you choose?

Oldest comments (0)