Ken W Alger

Posted on May 8, 2017

Why NoSQL, and what are some of the options?

#nosql #mongodb #graphdatabases

Have you noticed how locked into a particular technology developers get? Python vs. Java, Angular vs. React, Windows vs. Mac, cats vs. dogs... all topics and subjects that are capable of starting any good "flame war" on most any discussion outlet. Alas, the same goes for relational databases (RDBMS) versus non-relational databases. Since a large percentage of RDBMS utilizeÂ Structured Query Language (SQL) as their language and non-relational databases, generally speaking, don't use SQL, the debate has been framed as SQL vs. NoSQL.

SQL and RDMS database design have been around for a long time and are widely used technologies. In fact, they are excellent solutions for many different data models. Does all of your data have the same properties? SQL is great for that. Are you working with complex transactional requests? SQL shines in that environment.

As a real world example, think of your doctor's office and all of the neatly sorted files in their office. Do they store them in a relational way with all diagnosis of diabetes in this file... cancer over here, and then include a reference to a patient's contact information, and another file for everyone with Blue Cross insurance, again with more links? This is the SQL way.Â Or do they store all of the information about a given patient in one file? That's an example of a NoSQL data model.

Similarly, in the day and age of data coming in from a variety of devices with a wide range of data types, being able to store data in a more flexible format seems to make a lot of sense. A format which should be dependent on your application's access pattern and not necessarily a rigidly formatted data model thatÂ has to fit into an existing table structure. NoSQL data stores allow for this flexibility and often allow for faster application development and iterations as the flexibility of design is carried through from the schema itself down to the data record level.

It should come as no surprise that there are a lot of different formats of NoSQL databases. There are key-value store options such as Redis, graph models such as Neo4jÂ or Giraph, column models such as Cassandra and HBase, multi-model options like Couchbase or MarkLogic, and document data models like IBM Domino and MongoDB. Yes, I know, I left out many examples in each of those models, let the flame wars begin...

What are the strengths and weaknesses of each of these different database models? There are several different ways to evaluate the variety of features each database model offers to determine if it is a good fit for a particular organization or application structure. I'd like to look at two of the larger considerations, the way the data itself is modeled, and the way in which the data is queried.

NoSQL Data Models

Key-Value and Column Model

If we have a look at how key-value models store their data, which is similar to the column model, we see that they fall into a rather basic type of model. Each database item is stored as a key, or attribute name, and is associated with a value or in the case of the column model a multi-dimensional sorted map. This can work very well for unstructured data as the database does not require a set schema across key-value pairs. This canÂ scale very well and have high performance due to simplicity of design and the fact that only the key is of interest to the database.

While this is a fast way to represent data, the values in this design cannot be queried, only the keys can. This, obviously, makes it more challenging to do complex queries and aggregations. Having the ability to only query data by a single key value can be limiting.

Graph Model

Graph models use nodes, edges, and properties to represent data. A good example here is a social network where each person is a node, their connections to other people are edges, and properties are the information about a given node, such as a person's name. These databases can require a bit of a learning curve to understand, but work well for things like business supply chains, social networks, or complex hierarchical structures which are often challenging to model in a relational database.

It tends to be a rather niche sort of data store, however, as many traditional sorts of information don't fit well into the graph data store concept. They perform well when theÂ relationship between data records is the important concept than the actual data itself.Â Exploration of how many relationships away from someone else you are would be a good use case, i.e. the "6 degrees of Kevin Bacon" problem.

Document Model

Instead of rows and columns, document model databases store their information in, well, documents. If you missed my post on Modeling your data with Documents, it is goes into more detail. Most often the data is stored in a structure similar to JSON (JavaScript Object Notation). This allows data to be stored in a manner familiar to most developers and allows for each piece of data to effectively become an object. A key and important concept which is closely aligned with the familiar object-oriented programming pattern.

Data in a document database can have a dynamic schema where each document can have different fields and fields can be represented by different data types. This can make it very appealing due to the ease of adding new fields during development. The loss of being able to directly do multi-record transactions and JOIN operations is often overcome by this flexible and dynamic schema ability.

NoSQL Query Models

Since each application has it's own requirements for data retrieval and storage, determiningÂ how your data needs to be retrieved is an important consideration when making a decision on how toÂ store your data. Some applications may have very basic query needs. Others may have complex queries which search for a variety of values on each record.

If we look back at our patient record example, a doctor's office may have some queries that only require the lookup of a patient's name. Most of the time, however, they will be looking up additional information such as appointment schedule, patient's with a specific diagnosis, certain age ranges, etc. Having a robust enough query language and database to handle such requests is important.Â Further, since data is rarely static and often needs to be updated, having the ability to update records based on one or more fields is important.

KEY-VALUE AND COLUMN MODEL

As stated in the previousÂ section, these systems can search and retrieve information based on a single, or limited number of, keys. For more complex queries users are often required, in fact encouraged, to develop and maintain their own indexes. Updating records in these systems are often expensive, requiring multiple steps and trips to the database. In fact, it is often required to do an entire rewrite of the record to update, regardless of the size of the update.

GRAPH MODEL

As you mightÂ imagine with the requirements of these systems, an abundant query language is needed to explore simple and complex relationships. The query language needs to be able to provideÂ direct and indirect assumptions about the data. Therefore, relationship style analysis is of prime importance here with other, general purpose applications being less commonly implemented.

DOCUMENT MODEL

One of the highlights of many document model databases areÂ rich query languages allowing for searches to be performed on any field within a document. Often this includes the ability to add secondary indexes to the database to further increase performance. Data updates can frequently be done in a single trip to the database with some version of aÂ find and modify method.

Takeaways

At the end of the day then where does this leave us? How can we decide on which non-relational database to use? For starters, I would recommend looking at your specific application needs. If schema flexibility, rapid development, and/or efficient data queries are important, NoSQL is definitely worth considering. Once that choice has been made, the selection of which type of non-relational database is next. For many cases, the document model has the broadest, most developer friendly option set.

Follow me on Twitter @kenwalger to get the latest updates on my postings, or see the original post on my blog. Let me know how you are using NoSQL for your application needs.

Top comments (3)

Jan van Brügge • May 9 '17

Similarly, in the day and age of data coming in from a variety of devices with a wide range of data types, being able to store data in a more flexible format seems to make a lot of sense. A format which should be dependent on your application's access pattern and not necessarily a rigidly formatted data model that has to fit into an existing table structure.

This is what is wrong with Web Development today. NO, your Database should not mirror your application logic. Your database is a storage layer for information. You have APIs for accessing it the way you want to do it.
As a side note, a NoSQL DB does not make it easier to develop if you have different access patterns encoded in your data. Instead of knowing what is in your database (aka your schema), you have nothing but application code of X different apps that throw arbitrary JSON in your DB

In general there are little reasons to use a NoSQL Database. Those are basicly having large BLOBs and not much else. Note that this does not include Graph databases, as those have a lot of valid use cases like a social graph, etc.

Ben Halpern • May 8 '17

Thanks for this, Ken. Really clarifies some of these concepts.

Ken W Alger • May 8 '17

Pleased to hear it was helpful for you.