I've recently embarked on scaling up a side hustle which entails storing a large amount of user settings and customisations that they can make to a web platform. The product is essentially white label and the customer can change any number of settings to brand, open and close off many features. The level of customisation can even go down to the text that is displayed to their users.
Currently, I've been building these one off for customers to test the water (will someone pay for this?) and storing settings/customisations in JSON files. It's been quick and easy, but now I'm looking to scale to a platform where there could be many multiples of users as well as sharing amongst teams. In short, the flat JSON files aren't going to scale and I'm looking into storing settings/customisation in a database.
Whilst it would be easy for me to look at document stores such as MongoDB to mimic closely the functionality I already have with my JSON files, I've also been pushing some relational issues into those JSON files which really, I would rather have sat in a relational database. Without wanting to proliferate the amount of databases I have in my architecture (albeit if that's the answer then why not!) I have been thinking long and hard about the problem and the potential options.
After some deep thought. I came up with 4 scenarios. But first a little bit of information about how a setting can look:
- A single text item, typically for application names and titles.
- An array of items, usually used for storing a number of selected tags or items.
- A string, that has a default value, but can be customised by a user. Typically used for naming items or message text that can be changed.
- Numerics: Usually stored for min and max limits or definitive values in the application.
- Booleans: Typically stored for features to be turned on / off
The important part I cant forget though, is that there are always defaults. If a user does not adjust a setting, then a default value should be used. This is especially important for the text content shown on screens as well as messages pushed out to users.
Now, the methods and scenarios I've come up with after some brainstorming to potentially manage these settings and customisations:
A single row of settings per customer
In this scenario using a relational database I could store all of the settings required for each users application in a single table "settings", with each setting taking up a column and each user taking up a row.
Whilst this would be the simplest approach to get started, I don't feel like it has longevity. There are problems with ongoing changes I might need to make as well as "default values" which some users might not have based on the old schema. I'd need to go through lots of processing to update users and ensure there are no conflicts. However, as far as querying to return a users settings, this offers real simplicity.
JSON Storage
In this scenario I could take my existing JSON and push it into the relational database such as Postgres for storage. This would reduce the overhead of needing 2 databases (no-sql vs relational) whilst also keeping my model close to what I have. The big benefit is I dont need to worry about my schema ahead of time and I can add and remove setting as I wish without causing too many problems for other users.
With that flexibility though, doing any kind of big updates across my data are going to be a pain and trying to work with the data any further than simple selects might bring with it some overhead.
A property bag / key value approach
In this scenario I could create a very simple table containing an id, user_id, key, value in my database and ensure I have the flexibility to store settings with ease, whilst also being able to query for a user_id and pull back the whole set in one go.
My worry with this approach is the adherence to standards, if any of the keys became inconsistent I can end up with a bit of a mess. I also believe this can lead to some really bloated out table in my database. Assuming each customer had in the range of 200 settings, this can quite easily become a million rows with just 5000 users. I shouldn't be concerned with query performance on a million rows I guess.
An alternative?
My final option, well.. I feel like in each of the above scenarios I've tried to mention relational databases. But I have to face facts, each option actually is flirting with a nosql equivalent. Key value stores such as redis could be an option, although persistence and decomposing some long values might be a bit too inflexible. Document store such as MongoDb/Couch would match closely to my current model, giving each user their own settings document and pulling it back nice and easy. Whilst that sounds like a nice scenario, it means I'd probably also need to spin up a relational database to handle the features of the platform which are clearly relational. Do I want the extra infrastructure to look after and handling queries to two sources now instead of just one? Tough choice.
I'm interest to hear how others have approached this issue and also whether there are deemed any best practices? Whilst I've been quite wooly on the specifics of my project, I feel this is something others must have run into before!
Top comments (13)
Your property bag idea is usually called the entity-attribute-value or EAV pattern in database design, and it's notoriously messy. There are some situations where it's called for but it's hard to think of an example where it isn't a compromise with an inherently unpleasant structure.
Attribute per column is effective but leads to really sparse tables and puts major implementation & operational constraints around customizability.
Postgres' JSON/JSONB can do a lot more than you might think. Check out the documentation.
Thanks for the feedback and I fully agree on the bag being horrific over time. Do you believe there would be no case to use a document store dB and a relational dB? I'm seeing lots of good feedback on postgres but wanted to check it's not a case of people knowing that tech more or feeling more comfortable with it.
There are even a few nice mappers fully sold on the postgres route like massive.js which should ease the burden. github.com/dmfay/massive-js/blob/m...
hey, that one looks familiar!
I've split data stores before and losing referential integrity and joinability is annoying enough that I'd only do it if there were pressing performance or data governance concerns. With Postgres on the relational side of the equation, foreign data wrappers might ameliorate some of the ugliness but minimizing the number of information sources is still one of the key parts of keeping your architecture simple.
Lol didn't realise you worked on massive π
Do you happen to have any orm recommendations that work with massive.js or is the idea purely to skip all of that and just use massive direct? The documentation makes it clear it's not there to be an orm, but I do love having type security in my app as well as default values etc so if there is a preferred way to get into that I'd love to hear.
There's a subtle but important distinction between "not there to be an O/RM" and "not an O/RM"! Massive fills the same role of helping you get data into and out of Postgres, but it's a different way of approaching the problem: data mapping as opposed to object/relational mapping. The way Massive is organized effectively constitutes an API for your database with tables, views, functions, and scripts as endpoints.
If you want regular type safety, there are TypeScript bindings. But the lack of models is by design: models are an artifact of O/RMs having come up in strictly object-oriented, strongly typed languages like Java and C#. In a functional-hybrid, dynamically typed language like JavaScript, there's little point to them.
Default values can (and really should no matter what, because the database is the final arbiter of what your data looks like) be specified on columns in your
CREATE TABLE
scripts.Not really, but a table with 200 columns might be a smell. Why not put the JSON settings in a separate table? JSONB is compressed in PostgreSQL, you can index it and you can use it in relational SQL queries too. So if you know that some of this data is purely "key value" and some might be part of a query, I would consider using PostgreSQL full JSON power.
Since PostgreSQL 10 you even have full text search on JSON columns.
Hi, PostgreSQL + JSON is a nice solution, however, I didn't test it under heavy load.
I will recommend you also to take a look at Aerospike, specially this Use Case related to User Profile (aerospike.com/solutions/technology...).
I believe it matches the desired features, however, it is expensive, therefore you would need to measure again, or just ignore it if cost is not a problem (unlikely).
Hope my comment can help you to find the ultimate solution.
NOTE: I didn't any list Aerospike features here, just take a deep look to the website! And you will see, it rocks!
BTW picking one database helps, but usage patterns, sharding, caching and so on cannot be underestimated. So consider hybrid (crazy) solutions as well. This site highscalability.com/ has a lot of good examples and sketches.
Have fun!
Thank you for the comment and the links. I'll give them a good read tomorrow morning before work!
I've actually fought with this decision a lot and, I've actually ended up using two databases.
MongoDB for the settings.
Postgres for relational items.
There's a whole heap of reasons why I decided to go this route and many of them are in the omitted detail behind my original post, but I wanted to keep the question broad enough to provoke some answers without getting buried in details.
Ultimately though, it's come down to:
Finding adapters, access methods, ORMs etc. that would play nicely across an existing website and existing set of integrations into other systems drove me in the direction that right now it feels as though the world needs to catch up with JSONB (Specifically) being available through psql. I've actually got quite a decent setup right now where I can query across and join multiple data sources regardless to the underlying db which means the pain is quite low for me to incorporate more into my architecture.
Im fine with running multiple databases, if they have a clear purpose and scope. Settings and only settings are required to be stored in my doc store and I'm happy with that model. It also means I can move towards beta much quicker based on my already known data model from the old system.
Familiarity.. I took a good look into the postgres queries and execution required to pull data out as well as running updates and modifications to the JSONB structures. There's a lot of functions, a lot of new principles to understand which right now, I could waste a lot of time on without moving forward in my development timeline. I'll definitely be coming back to it, but I am much more productive at this stage running due to what I know.
Docker made it too easy.
I have a few services which actually need to know info about the settings (message bots/chat bots) so encapsulating the doc store with relevant APIs around it has turned it into a service for others.
Not sure if you are settled on Postgres, but MySQL / AWS Aurora can also have a JSON data type support as well.
Massive.js looks awesome. Thanks @Dian Fay!
Would love to hear your thoughts 2 years later - how did it go?
Thinking of Postgres + JSON most myself in a similar situation.
// Nvm, read the comments
Ended up with everything in postgres.
Settings table is very wide right now but I'll be converting it over to a few columns of jsonb in postgres.
Running 2 databases really wasn't an issue but I found myself often wanting to join data across both e.g. is setting A turned on and therefore I need some data from table B.
I decided to move everything off mongo once j got myself used to using JSON in postgres. I use the citus extension to scale horizontally as needed.