DEV Community

Cover image for Big Data and NoSQL: A Great Coupling
Ken W Alger
Ken W Alger

Posted on • Originally published at kenwalger.com

Big Data and NoSQL: A Great Coupling

Big data has been a buzzword and topic of conversation for many years. Similarly, the idea of NoSQL as a data store versus "traditional" relational databases is a frequently debated topic. With all of the conversation about these topics, it is important to ask a few questions. What is this thing called big data? Are NoSQL databases like OrientDB a good fit for big data? Let's explore these questions and learn why NoSQL can indeed be a great fit for big data.

Big Data Defined

A popular definition of big data states:

Any quantity of data which is too large or complex that traditional computing power is inadequate to deal with.

While this makes sense, it is important to note that the concept of big data is somewhat relative. It likely has a much different meaning and concept for Farmer Bob, the local apple farmer than it does for Tree Top juice company. In the technology realm, we often think of companies like Facebook or Netflix in relationship to big data processing. But every industry is capturing more and more data.

Think about our apple farmers. Some of the data they might want to be collecting would be soil moisture levels, air & soil temperatures, rainfall, the type of apple being grown, the geo-location of the trees, soil pH, fertilizer applications, pollination events, just to name a few.

In the data industry, there are four common dimensions of big data. The so-called 4 V's, volume, variety, velocity, and veracity.

4 Dimensions of Big Data
  • Volume - the size and scale of the data, a terabyte or more of data is often a breakpoint. But even that is relatively small. A Boeing 737 generates approximately 240 terabytes of data flying across the United States. How 'bout them apples?
  • Variety - the diversity of data in different forms. In our farming example, there could be different sensors around the orchard sending a variety of data into the database.
  • Velocity - the speed at which data is collected and needs to be processed. Perhaps not entirely applicable for Farmer Bob, but for many big data applications velocity is very important. Stock trading algorithms change very quickly and must be timely, for example.
  • Veracity - the quality of data itself, is it accurate? Is it stored accurately and can it be retrieved accurately?

All of these pieces together help to influence the decision as to if you are dealing with big data.

Assuming, then, that our apple orchard is large enough to qualify for big data membership, why would we choose a NoSQL database? How does a NoSQL database, like OrientDB, help to address the concerns of the 4 V's?

Stop the Presses: NoSQL for the Win

Okay, a bit of a pun there with presses while talking about apple orchards, please forgive me. But seriously, NoSQL does offer some great wins for storing big data. In fact, in many of the areas of the 4 V's NoSQL is a better choice than its relational, SQL, counterpart.

The variety of data becomes problematic in a relational data model as it requires multiple tables and many Joins to process the data. A multi-model database like OrientDB is capable of storing records in a schemaless fashion that is easy to generate programmatically. This allows for data with a variety of "looks" to be stored quickly into a single record. Being a multi-model database, a variety of types of records can be stored in the same database.

Further, without having to do Joins, traversing the data for read operations is very fast. The ability to read and write records rapidly is vital for keeping up with the velocity of data. Performance of the database is a huge portion of being able to reliably process your data and get results.

One of the challenges that SQL databases face is that of scalability. They are mostly designed around a master-slave architecture which often becomes a bottleneck. The solution to scale is over to upgrade hardware to bigger and faster. In the NoSQL world, we can scale with a multi-master and sharded architecture. This allows for horizontal scaling and adds to veracity as well as the new servers can be spread out across geographic locations.

Making sure the data you think you are storing is actually being stored is probably the most important of all. Having the ability and confidence of knowing that your database transactions are secure and accurate is huge. OrientDB transactions are ACID compliant providing an extra level of guarantee that your data is your data. This also helps to make sure that the technical department doesn't upset the apple cart with corporate executives.

Wrap Up

We have taken a look at what big data is and how to determine if your data is big data. We then examined why NoSQL is a great fit for storing our data. OrientDB is definitely worth a look at storing your NoSQL data for your big data needs.


Follow me on Twitter @kenwalger to get the latest updates on my postings or read my other posts on my blog.

Top comments (2)

Collapse
 
jbristow profile image
Jon Bristow

Do you think nosql can keep up when doing arbitrary analysis of structured data? I’m under the impression that for real data analysis work you’re going to want a query/access optimizer to help you out.

Also, I’m interested to hear how orientdb performs returning near petabyte result sets.

Collapse
 
kenwalger profile image
Ken W Alger

I think it all depends on your application and data design. NoSQL can be incredibly fast for processing but to get the most "bang for your buck" schemas need to be designed differently than they are in a SQL world. It's not a bad thing. It's just a different data modeling style.

In terms of OrientDB performance specifically, it can handle large loads. However, like I just said, the data model is important for top performance. There are many resources available on NoSQL schema design that are well worth the time to read.