Why NoSQL, and what are some of the options?
Ken W Alger May 8 '17
Have you noticed how locked into a particular technology developers get? Python vs. Java, Angular vs. React, Windows vs. Mac, cats vs. dogs... all topics and subjects that are capable of starting any good "flame war" on most any discussion outlet. Alas, the same goes for relational databases (RDBMS) versus non-relational databases. Since a large percentage of RDBMS utilize Structured Query Language (SQL) as their language and non-relational databases, generally speaking, don't use SQL, the debate has been framed as SQL vs. NoSQL.
SQL and RDMS database design have been around for a long time and are widely used technologies. In fact, they are excellent solutions for many different data models. Does all of your data have the same properties? SQL is great for that. Are you working with complex transactional requests? SQL shines in that environment.
As a real world example, think of your doctor's office and all of the neatly sorted files in their office. Do they store them in a relational way with all diagnosis of diabetes in this file... cancer over here, and then include a reference to a patient's contact information, and another file for everyone with Blue Cross insurance, again with more links? This is the SQL way. Or do they store all of the information about a given patient in one file? That's an example of a NoSQL data model.
Similarly, in the day and age of data coming in from a variety of devices with a wide range of data types, being able to store data in a more flexible format seems to make a lot of sense. A format which should be dependent on your application's access pattern and not necessarily a rigidly formatted data model that has to fit into an existing table structure. NoSQL data stores allow for this flexibility and often allow for faster application development and iterations as the flexibility of design is carried through from the schema itself down to the data record level.
It should come as no surprise that there are a lot of different formats of NoSQL databases. There are key-value store options such as Redis, graph models such as Neo4j or Giraph, column models such as Cassandra and HBase, multi-model options like Couchbase or MarkLogic, and document data models like IBM Domino and MongoDB. Yes, I know, I left out many examples in each of those models, let the flame wars begin...
What are the strengths and weaknesses of each of these different database models? There are several different ways to evaluate the variety of features each database model offers to determine if it is a good fit for a particular organization or application structure. I'd like to look at two of the larger considerations, the way the data itself is modeled, and the way in which the data is queried.
NoSQL Data Models
Key-Value and Column Model
If we have a look at how key-value models store their data, which is similar to the column model, we see that they fall into a rather basic type of model. Each database item is stored as a key, or attribute name, and is associated with a value or in the case of the column model a multi-dimensional sorted map. This can work very well for unstructured data as the database does not require a set schema across key-value pairs. This can scale very well and have high performance due to simplicity of design and the fact that only the key is of interest to the database.
While this is a fast way to represent data, the values in this design cannot be queried, only the keys can. This, obviously, makes it more challenging to do complex queries and aggregations. Having the ability to only query data by a single key value can be limiting.
Graph models use nodes, edges, and properties to represent data. A good example here is a social network where each person is a node, their connections to other people are edges, and properties are the information about a given node, such as a person's name. These databases can require a bit of a learning curve to understand, but work well for things like business supply chains, social networks, or complex hierarchical structures which are often challenging to model in a relational database.
It tends to be a rather niche sort of data store, however, as many traditional sorts of information don't fit well into the graph data store concept. They perform well when the relationship between data records is the important concept than the actual data itself. Exploration of how many relationships away from someone else you are would be a good use case, i.e. the "6 degrees of Kevin Bacon" problem.
Data in a document database can have a dynamic schema where each document can have different fields and fields can be represented by different data types. This can make it very appealing due to the ease of adding new fields during development. The loss of being able to directly do multi-record transactions and JOIN operations is often overcome by this flexible and dynamic schema ability.
NoSQL Query Models
Since each application has it's own requirements for data retrieval and storage, determining how your data needs to be retrieved is an important consideration when making a decision on how to store your data. Some applications may have very basic query needs. Others may have complex queries which search for a variety of values on each record.
If we look back at our patient record example, a doctor's office may have some queries that only require the lookup of a patient's name. Most of the time, however, they will be looking up additional information such as appointment schedule, patient's with a specific diagnosis, certain age ranges, etc. Having a robust enough query language and database to handle such requests is important. Further, since data is rarely static and often needs to be updated, having the ability to update records based on one or more fields is important.
KEY-VALUE AND COLUMN MODEL
As stated in the previous section, these systems can search and retrieve information based on a single, or limited number of, keys. For more complex queries users are often required, in fact encouraged, to develop and maintain their own indexes. Updating records in these systems are often expensive, requiring multiple steps and trips to the database. In fact, it is often required to do an entire rewrite of the record to update, regardless of the size of the update.
As you might imagine with the requirements of these systems, an abundant query language is needed to explore simple and complex relationships. The query language needs to be able to provide direct and indirect assumptions about the data. Therefore, relationship style analysis is of prime importance here with other, general purpose applications being less commonly implemented.
One of the highlights of many document model databases are rich query languages allowing for searches to be performed on any field within a document. Often this includes the ability to add secondary indexes to the database to further increase performance. Data updates can frequently be done in a single trip to the database with some version of a find and modify method.
At the end of the day then where does this leave us? How can we decide on which non-relational database to use? For starters, I would recommend looking at your specific application needs. If schema flexibility, rapid development, and/or efficient data queries are important, NoSQL is definitely worth considering. Once that choice has been made, the selection of which type of non-relational database is next. For many cases, the document model has the broadest, most developer friendly option set.
You want to start putting yourself out there in the universe to share your knowledge.. but when you sit down to write, you thoroughly convince yourself you have no business doing so. Stop it. Go write.