SeattleDataGuy

Posted on Oct 13, 2019

DynamoDB vs. Hadoop vs. MongoDB

#sql #hadoop #cloud

The best database for your current business needs is usually dependent on the skill set of your dev team and the applications in place already.

Understanding which database system will best fit your company's current and future needs is an important step. Databases play a crucial role in all industries and organizations.

Thus, picking the system that is the best fit both from a requirements stand-point as well as price-point can be the difference between a failed project and a successful strategy implementation.

With the ever-expanding landscape of ways your company can store data, we wanted to compare some of the more modern database systems companies are using.

Understanding what DynamoDB, Hadoop, and MongoDB offer will help you make a better decision for your business model. All of these systems are not necessarily interchangeable, and in some cases are more like comparing apples and oranges. However, because they all generally fall under the NoSQL umbrella, they often get clustered together.

So, let's start with an introduction for each of these systems, followed by comparing them.

What Is DynamoDB?

Amazon DynamoDB (from AWS database blog)

Created by Amazon, DynamoDB is an exclusive NoSQL database service obtainable as part of the portfolio on Amazon Web Services (AWS). The term originates from Dynamo, a highly accessible key-value store established in response to Amazon's e-commerce holiday outages in 2004.

At first, only a few teams adopted Dynamo within Amazon due to its high operational complication and trade-offs required between data consistency, performance, query flexibility, and reliability.

Also, during this period, Amazon developers preferred SimpleDB, its primary NoSQL database, which relieved users of database administration tasks. But SimpleDB faced several limitations which eventually limited its use.

Launched in 2012, DynamoDB is a database service on AWS created to tackle the barriers of both Dynamo and SimpleDB.

What Is Hadoop?

Apache Hadoop is a framework that allows for distributed processing of large data sets through computer clusters via simple programming models. Hadoop is designed to expand from single servers to multiple machines, with individual devices contributing local computation and storage.

Instead of relying on hardware to distribute high availability, Hadoop itself is designed to detect and handle failures at the application layer.

What is Hadoop (from IBM big data and Analytics Hub)

An in-depth look shows even more magic as Hadoop is practically modular. This concept implies you can exchange almost any parts for various software tools. This process enables an incredibly flexible architecture, that is also effective and robust.

What Is MongoDB?

MongoDB is a non-tabular and open database created by MongoDB Inc. The originators initially focused on creating a platform that uses completely open-source parts, but with the struggle to get an existing database to meet their requirements for building services in the cloud, led them to start creating a personal database system.

MongoDB (from MongoDB Sharded Cluster)

After realizing the possibilities of creating database software, the team shifted focus to creating MongoDB. Released in 2009, MongoDB is intended to create a technological foundation that enables development teams through distributed systems design, document data models, and unified experience.

In 2016, MongoDB announced its fully managed cloud database service, MongoDB Atlas. MongoDB Atlas provides genuine MongoDB which allows users to get rid of specific operational tasks.

Now, the differences.

Ease of Use, Setup, Admin

DynamoDB

For DynamoDB, the managed service abstracts users from the underlying infrastructure and interacts only with the database over a remote endpoint.

There is no need to bother about operational concerns or additional hardware provisions. This approach makes DynamoDB very easy to get started.

Hadoop

Hadoop has several options when it comes to setup. It is plausible to manage Hadoop with almost 0 abstraction and just command-line away.

Of course, this means you need to be comfortable with command-line, as well as understanding how to set up hardware. Due to the complexity, there have been multiple companies, such as Cloudera, that help you manage Hadoop with less heavy lifting.

If done well, using a third-party could save you hundreds of thousands in personal costs (because hiring a Hadoop engineer is often upwards of 150k for one).

MongoDB

MongoDB is one of the most straightforward to manage, which is not a Saas. You can easily download and start interacting with MongoDB quickly. Here is a quick guide for your Mac.

Quality of Support

DynamoDB

For DynamoDB, quality support via the community support forum, enterprise support, ServerFault, and Stack Overflow is available.

The DynamoDB community offers sample applications, drivers, extensions, and tools. In addition, since DynamoDB is part of AWS, depending on the size of your business, you might get further support directly from Amazon.

Hadoop

For Hadoop, several businesses provide commercial implementations and support for this system. Hadoop has been around long enough to have multiple communities, support tools, and courses to help improve your ability to manage and develop on the system.

Personally, we do feel it can be one of the more difficult systems to get support for if you are only referring to the original software.

However, there are so many third-parties that have stepped in to abstract you away from this, we think most large organizations are fine to consider Hadoop as a data storage system.

MongoDB

MongoDB offers a community support forum, ServerFault, and Stack Overflow. Users also get 24/7 enterprise support with a non-compulsory lifecycle through enterprise-grade support.

The MongoDB community also provides information about events, MongoDB University, user groups, and webinars.

Database Structure

DynamoDB

DynamoDB makes use of attributes, items, and tables as its core parts for users to often work with.

The table involves a collection of items, and the individual item is an assembly of attributes.
Also, DynamoDB employs primary keys in exclusively identifying the individual item in a table.
The use of secondary indexes offers more flexibility in querying.

MongoDB

MongoDB employs the use of JSON-like doc files in storing schema-free data.

The collection of documents in MongoDB does not entail predefined columns and structures that can differ for various documents. Several features of MongoDB in relational databases include:

Easy-to-read query language.
Strong consistency.

As it's schema-free, MongoDB permits the creation of documents without the need to create the document structures first.

A key contrast with Relational Database Management Systems (RDBMS) includes:

Table | Column | Value | Records

When compared to MongoDB, it includes:

Collection | Key | Value | Document

This approach implies collections and tables are similar for MongoDB and RDBMS respectively. Also, Documents bear resemblance to Records.

Hadoop

Hadoop takes no data structures; intrinsically, it just takes in the data type to be used on the system. Hadoop applies the Schema-on-read method, which improves its versatility for all data sets.

All data in Hadoop is stored as a file system, and other techs like Hive and Impala add schema to objects which enables viewing of the underlying data in table format.

If you are managing Hadoop in itself from the original software, this can become very complex because the filetypes you pick and encode play a huge role in everything, from speed to space. It can also be very difficult to reverse specific decisions.

Right for Your Business

DynamoDB

DynamoDB remains a popular choice for the gaming and Internet of Things (IoT) sector. If you use the AWS stack and you desire a NoSQL database, then DynamoDB is a great option.

Bear in mind; you may not have access to embedded data structures like you do on MongoDB.

Hadoop

Hadoop is a popular choice for large-scale enterprise that necessitates server clusters where specialized data management, programming skills, and costly implementations aren't an issue.

Hadoop can also play useful roles in building future enterprise data hubs. It can be difficult to manage (depending on how you decide to manage it, with or without a third-party) but it also provides a lot of advantages.

MongoDB

MongoDB offers a great choice in terms of caching and scalability.

It also plays a great role in web development because it can make passing document style data easy from the back end to front end. This makes it an easy option for companies that create content management systems.

Performance Issues

DynamoDB

For DynamoDB, the following performance issues are highlighted:

DynamoDB's pricing model is very expensive.
Low latency reads.
Geo-distribution issues.
Not so easy to set up a CI/CD.
Identification of the exact key that leads to partition is complicated.
Durable consistency is not widely available.
No ACID transactions and secondary indexes.

Hadoop

For Hadoop, the following performance issues are highlighted:

DataNode and NameNode slowdowns.
Map reduce data locality.
TaskTracker performance and effects on shuffle time.

MongoDB

For MongoDB, the following performance issues are highlighted:

It is vital to design indexes in conjunction with access patterns and schema.
Problems with large objects, and unusually large arrays.
Settings for security and durability remains a concern.
There is no query optimizer.

Third Party Tools

Besides these differences, it is always interesting to see what tools are floating around to help further support each of these systems.

So, let's take a look at a few:

Rockset

Rockset is a scalable, reliable search and analytics service in the cloud that makes it easy to build fast operational applications on TBs of data, simply using SQL.

That's the big benefit of Rockset. Using this tool, your team doesn't need to be familiar with another query language.

NoSQLBooster

NoSQLBooster is a GUI, developed for managing MongoDB. In addition, it allows you to query using both SQL syntax and MongoDB syntax.

So, not only does it make managing your databases easier (think using SQL Server Management Studio) but it also can make it easier for analysts to run queries to answer business questions.

Sqoop

Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. This could also be called an ETL tool of sorts and helps make interacting with Hadoop easier.

Conclusion

DynamoDB, Hadoop, and MongoDB are all very different data systems that aren't always interchangeable. Each database has its pros and cons as well as use cases.

The key points highlighted above are intended to help you make better decisions about these database systems. Depending on your organizational size, adopting any of these database systems offers highly diverse data types, efficient application management, and more.

Want to read more about data science, programming and big data?

Basic Data Science And Statistics That Every Data Scientists Should Know
Why Use Data Science?
Top 10 Business Intelligence (BI) Implementation Tips
5 Great Big Data Tools For The Future -- From Hadoop To Cassandra
Creating 3D Printed WiFi Access QR Codes with Python
The Interview Study Guide For Data Engineers

DEV Community

DynamoDB vs. Hadoop vs. MongoDB

What Is DynamoDB?

What Is Hadoop?

What Is MongoDB?

Ease of Use, Setup, Admin

DynamoDB

Hadoop

MongoDB

Quality of Support

DynamoDB

Hadoop

MongoDB

Database Structure

DynamoDB

MongoDB

Hadoop

Right for Your Business

DynamoDB

Hadoop

MongoDB

Performance Issues

DynamoDB

Hadoop

MongoDB

Third Party Tools

Rockset

NoSQLBooster

Sqoop

Conclusion

Oldest comments (0)