MongoDB has no use case

Ankush Thakur on August 29, 2018

Ever since the hashtag #healthydebate was launched, I've not been able to sleep because of over-excitement. There are so many options in developmen... [Read Full]
markdown guide
 

My 2 cents:

MongoDB is great for quick prototyping. You want to get things done quickly? Get MongoDB .. worry about the problems later. Need a bit more reliability because your prototype is now dealing with real data? Set up a replica set, it's easy.

Nowadays I just don't see a use case for RDBMSes - their "consistency" guarantees cause nothing but trouble for you if you want to scale, or change something. Set up sharding and you'll have to delete your foreign keys anyway. End up with a table with a few billion rows and changing the structure is going to be a fair bit of work regardless.

The worst thing about RDBMSes is that they do practically nothing to help you solve the problems modern applications face. They expect you to pour a massive amount of cash to DBA consultants when you have problems they haven't thought were worth solving in the core product. MariaDB + Galera might be the only bright light in the area, but even then you have to deal with the fact that it's still MySQL: grimoire.ca/mysql/choose-something...

I'd rather deal with many of the problems in code, writing migration scripts, using things like default values on my classes, etc. than expect my database to do anything but create new problems if I try to do anything too fancy with it.

Also people need to get out of the single database mindset. Use what's good for the job you need done. Want search? Use ElasticSearch. Want time series data or a massive key-value store? Maybe Riak is right for you. Want to get something prototyped quickly? MongoDB isn't going to fight you.

Then when it comes to these invented "examples" like this:

Use MongoDB, and you will be shaking at the knees on that day when you're asked to generate a report that concerns fields that are embedded 43 levels deep across multiple objects!

If you're an idiot, no database is going to save you. If you're not an idiot, your choice of database isn't going to make you into one. I.e.: don't use a 43 level nested document. It's probably stupid, regardless of your specific use case.

 

Set up sharding and you'll have to delete your foreign keys anyway. End up with a table with a few billion rows and changing the structure is going to be a fair bit of work regardless.

Words of gold! This reminds me of an app (from an earlier job) where there were four different MySQL databases as part of four different services. So, of course, foreign keys were not possible.

The worst thing about RDBMSes is that they do practically nothing to help you solve the problems modern applications face. They expect you to pour a massive amount of cash to DBA consultants when you have problems they haven't thought were worth solving in the core product.

You mean, from the point of view of scaling? Otherwise, I guess MongoDB too would need some tuning that only an experienced DBA can do?

If you're an idiot, no database is going to save you. If you're not an idiot, your choice of database isn't going to make you into one.

Yes, of course, but please realize that this article written in the spirit of fun. I don't mean any of the vitriol contained in it, and I hope that much is clear from the disclaimer and the post-script.

I'd rather deal with many of the problems in code, writing migration scripts, using things like default values on my classes, etc. than expect my database to do anything but create new problems if I try to do anything too fancy with it.

Could you please expand on that?

Also people need to get out of the single database mindset. Use what's good for the job you need done.

Yes, exactly. Which is why this post targets MongoDB specifically.

Want to get something prototyped quickly? MongoDB isn't going to fight you.

So, you are of the opinion that MongoDB is best left to quick prototypes only?

Finally, thanks a lot for adding in your comment. Means a lot to me!

 

You mean, from the point of view of scaling? Otherwise, I guess MongoDB too would need some tuning that only an experienced DBA can do?

Largely that, but all kinds of other little things. If talking about scaling, MongoDB specifically does fairly little (though replicasets are super easy to setup), but things like Riak make most of that work quite a bit easier.

Yes, of course, but please realize that this article written in the spirit of fun. I don't mean any of the vitriol contained in it, and I hope that much is clear from the disclaimer and the post-script.

Sure, but the statement is sort of difficult to phrase in another way without losing the meanings. It's often annoying how people are stuck in a mindset that their DB choice is what makes their application great or not great, or saves them from programmer errors.

Personally I find that the DB choice matters fairly little in the end, but of course specific use cases should be thought of. What matters most is if you choose a DB that's constantly going to fight you, or if you choose a DB that will let you get on with your work.

Also let's face it, it's your programmers who write your DB definitions. Guess where they can make the errors in if you let them create logic, limitations, etc. in the DB? Also fixing your DB tends to be more work than fixing your code, largely because the peoples' expertise is with code, not the DB.

Could you please expand on that?

Ok so say you have a user table. You want to add a column birthday. Well you don't know the value for the existing users so you have to set the default to NULL or similar. You can either create a column in SQL with the default value to be NULL, or you could use a document store - leave the objects in the database as they are and in your model class in your code you just add the birthday -field with a NULL value as the default. Then if you've written your DB code nicely, it will only set the values that were present in the DB and you end up with a NULL birthday.

Similarly even slightly more complicated migrations, such as merging first name & last name to one field, calculating a hash of a field for faster indexing, or some such can be done with code in more interesting ways.

If you run a query like UPDATE user SET name_crc=CRC32(name), you have no control over the rate at which it processes data, it might cause pretty heavy load on your DB depending on the DB and the underlying hardware.

If you make a script that runs through all the entries in your DB and does that in code, then you have much more control over the rate and other such details. It might take a bit more work, but after you've written your first one like that you can just reuse the same logic. You additionally gain things like the ability to lock each entry with a distributed lock across your code and you won't end up with race conditions:

  1. A request incoming to your API loads the DB row
  2. UPDATE user SET name_crc=CRC32(name) processes the row
  3. Request completes and saves the data in it to the DB

In THIS specific example it doesn't matter much (your code should probably set the CRC32 properly at this point), but in many other cases it does. If instead you do in a script:

  1. Get a distributed lock for your DB object
  2. Load the DB object
  3. Process it
  4. Write it
  5. Release lock
  6. Next item

Assuming you use the same locking mechanism across your software (you should), then you can be pretty sure nothing else changed the underlying object at the same time.

So, you are of the opinion that MongoDB is best left to quick prototypes only?

No, but it's where it really shines. MongoDB does have a bunch of issues, but it's not like many of the same issues don't exist in other popular databases. It's basically picking your poison.

I've personally found MongoDB scales fairly effortlessly to decent sized environments, but I would probably try to find more use-cases for Riak, Cassandra, and others as the load grows to make sure I don't rely just on MongoDB too much.

Sorry for the late reply!

or you could use a document store - leave the objects in the database as they are and in your model class in your code you just add the birthday -field with a NULL value as the default.

I don't really see how missing fields are better than fields with NULL values. Ultimately, in the code, they'll both respond to logical comparisons equally nicely. ๐Ÿ‘€

If you make a script that runs through all the entries in your DB and does that in code, then you have much more control over the rate and other such details.

Again, I see no reason why we can't do this in a relational database.

Your distributed lock example, though, hits right home. I don't think this is something we'll ever manage in something like MySQL. Even read locks are tricky, from what I gather.

Personally I find that the DB choice matters fairly little in the end [...]

Honestly, I'm beginning to think that way, and the more I dive into discussions by senior developers, the more I see that foreign keys and other "nice" constraints are actually a stumbling block in the long run.

In the end, though, I have a nightmare scenario that keeps me from embracing document databases. Let's take the classic example of a blog: you store the posts and user's comments as an aggregate, and it makes a lot of sense. Imagine, though, that the management wants to know which comments were posted between 4 pm and 8 pm on a certain date. If it's a document store, I'll have to scan through all the aggregate fields. ๐Ÿ˜ญ

BUT, I'm also not sure if a SQL BETWEEN is going to be very useful as the number of rows hits millions.

Thoughts? :-)

I don't really see how missing fields are better than fields with NULL values. Ultimately, in the code, they'll both respond to logical comparisons equally nicely. ๐Ÿ‘€
If you make a script that runs through all the entries in your DB and does that in code, then you have much more control over the rate and other such details.

Missing value does not always equal NULL, and you might want your default to be different from NULL, while having the field nullable. Lots of cases. Also it's not as much about "not being able to", as I believe nowadays MOST SQL databases can do live schema changes without the whole database freezing for a long time, but more about habit.. RDBMS users tend to just think differently.

Your distributed lock example, though, hits right home. I don't think this is something we'll ever manage in something like MySQL. Even read locks are tricky, from what I gather.

MySQL can function as a nice host for that to some extent, with GET_LOCK(), but it can't scale past one server easily. Really you want something more like etcd.

Let's take the classic example of a blog: you store the posts and user's comments as an aggregate, and it makes a lot of sense. Imagine, though, that the management wants to know which comments were posted between 4 pm and 8 pm on a certain date. If it's a document store, I'll have to scan through all the aggregate fields.

There is such a thing as the management being wrong, and you, as the expert, being required to tell them they're wrong.

Also, you can design your schema wrong in any database, why is it the fault of the database that you made a mistake? If this is a change in requirements, why is it out of the question to spend development effort to change the data structure?

Lastly, you should pick the right tools for the job, not try to do everything in the one tool you happened to choose - in case of a database if you have read-only replicas for analytics etc. you can pretty safely run even heavy queries there (MongoDB can probably do more complicated queries than MySQL with less effort due to it's aggregation and map-reduce systems), but really when you're starting to do more complicated searches you should use a system designed for complicated searches, such as ElasticSearch.

There's also denormalization for lazy (and smart) people, just save the data twice in different structures, so it can be fetched optimally in your different use-cases.

 

If you want to do microservices where each service has it's own DB, you'll have no foreign keys anyway. You'll only have a REST API and have to join things together. Or you can materialize views based on the distributed databases for reading and invalidate caches (LOL that'll be fun) when you write things.

 

Is just me or #1 is anti no-sql nothing to do with Mongo directly?

2 . From my limited experience I heard that you do not use Mongo for large scale (TBs) of data, you use Cassandra and other "big guns".

4 . In my experience the writes are slow, but maybe is just me. Anyway you compare it to SQL engines, so not exactly a valid reason.

5 . Again you compare to SQL databases. Why would you log to a relational database, if you do not have relationships? Anyway time-series databases are better suited for logs and analytics, or key-value.

Overall you could rename it to No-SQL has no use case, your beef is not related to Mongo from what I can see.

 

Overall you could rename it to No-SQL has no use case, your beef is not related to Mongo from what I can see.

Not at all! Redis excels at caching, while graph databases have no parallel in the RDBMS world. It's MongoDB that bothers me the most, which is why I wrote this.

Why would you log to a relational database, if you do not have relationships?

Well, how about I don't want to use a separate database just for logging? What's wrong with RDBMS for logging?

In my experience the writes are slow, but maybe is just me. Anyway you compare it to SQL engines, so not exactly a valid reason.

From what I read, MongoDB provides very fast writes in exchange for not-so-strong consistency.

Finally, I don't want the discussion to detract from MongoDB. The question is: what is MongoDB good for that, say, relational databases are not?

 

Just search for "best use cases for no-sql" or MongoDB, you will be amazed what great things can be accomplished when you exit your one language/one database bubble.

You can also read some research papers, starting with the DynamoDB to see the tradeofs and why/when you cannot use SQL.

BTW RDBMS storage engines are actually using a NoSQL implementation behind the scene, usually key/value tables and other data structures for indexs, so in a way, SQL does not exists, is just syntax sugar that adds complexity.

As for the MongoDB it may not excel at a technical point, but it sure helped a lot to bring the NoSQL to mainstream for their easy to use feature. So say thanks to them, do not use it and move on :D.

All good points!

Yes, DynamoDB has been getting a lot of attention. I'm sure Amazon didn't pour effort into it simply because they wanted to show off!

As for the MongoDB it may not excel at a technical point, but it sure helped a lot to bring the NoSQL to mainstream for their easy to use feature. So say thanks to them, do not use it and move on :D.

That's a fair argument, too! xD So, what would be a better but similar alternative? Riak?

RDBMS storage engines are actually using a NoSQL implementation behind the scene

Again, the implementation doesn't matter for application programmers; only the interface does.

All in all, thanks for your time and answers!

Yes, DynamoDB has been getting a lot of attention. I'm sure Amazon didn't pour effort into it simply because they wanted to show off!

No, they really needed it, like Google needed BigTable, Amazon needed Dynamo, you can see here one "small" (as in a huge but only one) portion of amazon website migrating from hundreds Oracle instances to Dynamo

I think the technology and knowledge evolved, and now both companies and others built SQL-like databases that scale horizontally. Hiding the no-sql behind a SQL query processator, you can see: Google Spanner, Amazon Aurora, Youtube Vitess.

So, what would be a better but similar alternative? Riak?

For me Mongo is a viable solutions (for Document storage), for small/medium projects, and the nearest alternative is CouchDB

If you know you will cross the few terrabytes threshold I would recommend Cassandra, and if you do have a team of DevOps a cloud managed database.

Is not that MongoDB could not handle it, I'm sure there are huge clusters around there somewhere, but I heard sharding Mongo is an operational pain while Cassandra does it by default and probably better.

BTW RDBMS storage engines are actually using a NoSQL implementation behind the scene, usually key/value tables and other data structures for indexs, so in a way, SQL does not exists, is just syntax sugar that adds complexity.

Key-Value tables aren't the lowest level, either. Real Programmers write their apps directly on top of the block device interface.

 

But aren't time-series databases also No-SQL? My understanding is that No-SQL means database technology that isn't relational.

 

Yes, but they are a different type of NoSQL than MongoDB. Just saying that neither SQL or Mongo are suited for logs.

 

So there are a few things that have been changed in Mongo over the years that makr it better. One is the removal of the global lock, the other is the institution of the oplog. The former helps performance when mixing reads and writes, the latter the durability of writes. So let's tackle a big misconception. Yes, the database is schemaless, but your data should not be. You should have a schema that's managed by your application. What gets written to dial is irrelevant. Yes, the tool allows you to be dumb, but you shouldn't be. Mongo is kind of unique in that it supports map-reduce along side the data storage as well as the aggregation pipelines. Those are extremely powerful and allow a large range of data processing that's not available with normal RDBMSes.

 

Mongo is kind of unique in that it supports map-reduce along side the data storage as well as the aggregation pipelines. Those are extremely powerful and allow a large range of data processing that's not available with normal RDBMSes.

Thank you, thank you! That was certainly very helpful. I will read up more on it.

 

Still, no ACID, no integrity constraints, no efficient backup/upgrade solution.

I'd see MongoDB fit for data cache, but I couldn't see it in any other use case.

 

ACID is out, and there were keys and indexes from the start. What is next, joins? ๐Ÿ˜‚

Saying MongoDB finally has XYZ is like saying PHP finally has namespaces and JavaScript finally has dependency injection. It helps, sure, but only if you're stuck in that ecosystem.

The problem I see here is that a lot of people over-abuse MongoDB like it's a general-purpose DBMS, and they put themselves under a lot of technical debt for this choice: hype-driven choice and not technical-driven.

In most use cases, the core data you exploit is relational and it makes sense to use a RDBMS, but an awful lot of users decide that having a constrained and validated model handled by the RDBMS is too constraining for them, or that they just prefer to use "the most awesome solution overall" and blindly choose their tools based on pure hype/laziness and not technical need.

I dunno, I feel it's more common for people to over-abuse RDBMSes like they're general purpose databases ;)

RDBMSes have drawbacks, MongoDB and other databases have their drawbacks.

If you assume RDBMSes are the one right "general purpose database" choice, then you're wrong. MongoDB and others like it can fill that role perfectly fine, for many cases.

Never choose your tools blindly, and this includes the choice of RDBMS vs. NoSQL. The "benefits" you get from RDBMS are going to make your life a living hell if you ever need to scale out past one node (except maybe in a Galera cluster type setup), or to a sharded environment.

In a sharded environment pretty much all benefits of RDBMSes are gone, and the remaining features just add more pains - now you have X databases storing data for the same thing, and their strict structures, restrictions, etc. can be out of sync with each other unless you want to take your whole app down during the schema change. Whoopee.

Oh also as a random point that came to my mind: please everyone stop implementing searches on SQL databases. They're awful at it.

I agree with that.

The thing is, most data models are relational.

Based on that same argument (or a very similar one) you could probably just say most people should be using a graph database, instead of an RDBMS.

Just because you know RDBMSes already, or because they're widely used, doesn't mean they're the right tool for the job.

[...] unless you want to take your whole app down during the schema change. Whoopee.

Now that's a VERY good point. I can imagine how much pain it will be to try to wrestle with an RDBMS when doing schema changes. Hmmm ...

 

False on backups. There are many managed Mongodb services out there. For many use cases you don't need ACID and the majority of NoSQL databases do not support it except as an option. If you just want a cache then use Redis.

 
 

Meanwhile $MDB trading at $72 a share on strong and continued growth.

Javascript is bad is the only valid argument mentioned above

Facebook and Google both use MongoDB

There are many applications beyond the realm of webapps

 

Meanwhile $MDB trading at $72 a share on strong and continued growth.

And how does that make it a better database?

Javascript is bad is the only valid argument mentioned above

Actually, I'd say the opposite. ๐Ÿ˜Š By that slap image I wanted to convey that it doesn't matter, but working with JavaScript-like syntax helps.

Facebook and Google both use MongoDB.

Really? What for? Also, didn't these companies exist long, long before MongoDB?

There are many applications beyond the realm of webapps

So, do you concede that MongoDB isn't suitable for web apps? The question remains: what other applications are you talking about?

Thanks for jumping in!

 

What are the known applications running in Facebook and Google that use MongoDB?

 

MongoDB totally rocks. The schemaless approach means I can finally decouple from the domain. MongoDB has opened the door to a level of flexibility that SQL simply can't.

 

There is no such thing as a schemaless database. Just databases that dont support validating data. Which means your code has to do it. It also means that the schema is your whole code base. And of course you are never sure of all changes to the schema were applied to the whole database. So you put null checks everywhere.

 

I don't have time to write an in depth article addressing how I'm using MongoDB, but I have written a previous article discussing the architectural principle of domain decoupling. This article is actually more a case against domain driven design.

dev.to/cheetah100/domain-driven-di...

I produced a video about the dangers of coupling to the design here:

youtube.com/watch?v=uwZj4XF6zic

I have also produced a video about the solution I developed which implements domain decoupling here:

youtube.com/watch?v=vUxKeaMjDo0

In Gravity the data structures are defined by runtime configurations that are also stored in MongoDB. This means that users can define new data structures at runtime along with detailed business process rules without a line of code.

The project started with JackRabbit JCR as the data store, but recently moved to MongoDB. Both JackRabbit and MongoDB opened the door do writing data decoupled applications. This application automatically handles issues like security and ownership. And now with the help of Mongo Aggregations and lookups we are able to provide similar functionality to SQL queries.

Thank you for the detailed answer! I will spend some time going through these links and what you wrote, and come back to comment.

 

Exactly, the schema constraints just move to your application code, where there is a far greater chance of making an error. Plus you also have to duplicate the constraint logic in every new service you write.

Thank you for chiming in! I wish there was a single way to settle this. :)

 

Thank you for this! To be fair: I wouldn't call MongoDB useless. It's really pretty neat, and I think its value as a JSON document storage tool is real. That said...

I've been a developer for 25 years and a professor of software development for about 7. My latest round of classes includes a week of MySQL, then a week of MongoDB a few weeks later. Students commonly ask why people should bother with a relational database when they can have the ease of something like Mongo. And in the context of a few weeklong lessons, they're right! But I've had to explain that all of the hard stuff in relational databases (defined schema, foreign key based design, etc.) are there because they keep terrible things from happening to data, and that tossing that stuff out the window should only be done when you realize what you're giving up as well as what you're gaining.

 

Playing the devil's advocate here, Mongo is good for quick prototyping when you just launch a service. Once you have an idea of how the data is shaped, move to Postgres or MySQL. Postgres supports BSON/JSON datatype in later versions so Mongo is not needed for even that.

 

I agree that it's good for prototyping. What concerns me is:

1) The tool that you prototype with often ends up as the tool you use in production because you don't have the time to migrate it later. This has happened to all of us, I'm sure :)
2) Sometimes - especially with data models - people should really take time at the beginning and plan it out as much as possible. Having a schema-less database is more often an excuse to be lazy about planning, in my experience.

That said: I think that an experienced developer, who knows what NoSQL can and can't do, will be great with either type of db.

I agree. There is no better alternative to thinking about your data model upfront. Some devs I know who start out with NoSQL because of a (irrational) fear of changing a schema in production. Their process is to run an MVP for a month or so till feature requests slow down, then build the schema.

I can never convince myself that MongoDB saves non-trivial amounts of time when prototyping. How long does it take to design a schema? You anyway have to write migrations. And if something doesn't work, just drop the database, change the seeders a little bit, and rerun the migrations. For a prototype that takes a month to complete, MongoDB would save a day at best.

I agree! Especially given how good MySQL has gotten with migrations and the beginnings sharding.
I'm thinking of writing a post here where I beg the community to convince me that I'm wrong about NoSQL.

 

Postgres supports BSON/JSON datatype in later versions so Mongo is not needed for even that.

Epic! ๐Ÿ˜‚

But then, once the data set gets too large, does it perform the same or better? What about map/reduce pipelines, as some comments have pointed out?

Mongo does have an easy way of sharding and clustering. Earlier versions of Mongo suffered from unreliability during network partitions but they have improved it.

There are Postgres extensions which do the same. It depends on what you are doing with the data though. For example, for many simple queries in Postgres you need to write separate application code to retrieve from Mongo, thus necessitating more nodes for the same task.

 

You're welcome! No, I don't think MongpDB is useless, but I don't understand it at all. That's why I started this discussion.

So, some of the comments here suggest that once a database is split up (which is often the case) you lose foreign keys anyway. And so, they say, MongoDB is better. What would you say to that?

Also, since you teach MongoDB, there has to be something to it. What is it good for? :)

 

This article tells me that the author has never had a use case for document-storage. The fact that there are absolutely no counter-points for when a document database IS a good option leads me to believe this article is nothing more than FUD.

 

The above comment tells me that the poster isn't familiar with the #healthydebate hashtag, and didn't bother reading the introduction and postscript. Since he skimmed over the article and provides absolutely no positive uses for MongoDB, it leads me to believe that his comment is nothing more than DUD. ๐Ÿ˜

 

The problem is not with noSQL databases itself, but with people who blindly want to force a clearly relational data structure into noSQL, "because it's the f*cking future". Or not, probably not, well, most of the time it's not...

noSQL databases usually are key-value stores, which can be used many other things than storing persistent data that have to be IN RELATION with each other. Consider Redis, which is a great way to fasten up your RDBMS as well, meaning less read/write into your DB, cached into memory.

 

Had to dig way too much to find your answer. That's exactly what I see as the issue.

There's no point saying "my software managing financial transactions, which needs absolute integrity, is now going to use MongoDB because internet says it's good". It's pointless, since you're basically choosing to shove lots of manual checks in your code just because of "that fancy database".

 

Yes, I see clear uses for Redis, and say, graph databases, but I've never understood MongoDB. I wonder what the fuss is all about when it comes to the MEAN stack, so I thought I'd create a hilarious post.

 

What a useless rant. I have worked with all kinds of databases for decades, network, relastional, various no sql. I have implement my own relational database core. I have worked below the SQL level as well as through SQL many times. And I have produced ORMs on top of many types of database technology. I even worked for some years on an object database.

MongoDB is quite sound database technology. He still has one or two warts but so does most everything else including the most well regarded relational databases. Overall I am much happier with it for variable document storage and for flexibility in schema than any other easily accessible database technology.

It wins hand down over some pseudo-database like DynamoDB. I can believe how much low level knowledge and calculating that one forces on users.

So what exactly are you comparing it to?

 

If you take the trouble of reading the disclaimer at the top of the article, you won't find it a "useless rant".

 

The biggest turn off, for me, is the simple concept that MongoDB/ObjectDBs enforce the concept of persisting OPINIONATED data structures. The concept of relationships do not inherently exist within the db realm; they must be enforced by an external program (correct me if this has changed, or is down-right wrong).

Yes; I support the notion that document databases are useful during prototyping. They are also, arguably, useful for caching data served from a JSON-powered integration; if the data is only refreshed daily, there's no need to call the API endpoints on each request (just cache it, as is!).

With that said, if your data outgrows the confines of a simple transaction-layer CRUD application, and into the realm of data analytics, you will be spending more time destructuring your objects, and restructuring them into either normalized data sets, or into the structures that you require for said task. This is essentially a form of technical debt, if you're coming from RDMS.. The data, what would have been stored in normalized, and enforced, separate entities are now grouped together into an opinion that, was the opinion of the application that collected the data, but no longer the opinion needed.

As far as the fallacy that MongoDB is 'faster' than SQL supporting RDMS, there's plenty of data on the net, and at your library/uni to support the contra. Just my ignorant 2 cents..

 

if your data outgrows the confines of a simple transaction-layer CRUD application, and into the realm of data analytics, you will be spending more time destructuring your objects, and restructuring them into either normalized data sets, or into the structures that you require for said task

This remains the single biggest reason I still haven't made the mental switch. Or let's say, this is the only reason standing in the way. I think all other features are most welcome, but having to wrangle with data again and again ... no, no! Maybe it makes sense to use MongoDB as an analytics cache that is constantly being updated.

As far as the fallacy that MongoDB is 'faster' than SQL supporting RDMS, there's plenty of data on the net, and at your library/uni to support the contra

Yup. The biggest bottleneck is the network, at least as far as I think. Whether a DB read takes 20 nanoseconds or 200 milliseconds doesn't make a tangible difference. ๐Ÿค”

 

As with lots of other non-relational databases, as well as plenty of big data technologies, trying to apply them to your regular transactional problems leads to nothing, since they're not meant for that.

The first thing you should consider is that noSQL as a whole is meant for large volumes of data that won't lead to someone's death if a few records are inconsistent. I know it's a hard thing to imagine, but it's a thing that happens.

Also, using Duke's answer, large scale scenarios lead to multiple databases running for multiple purposes. Fast changing data and schemas, like analytical processes, can have a huge gain when using this kind of structure, since new data can be loaded and used faster than when using a RMDB.

I've used MongoDB (and other document databases) for fairly small personal projects, but the development speed, specially considering the analytical scenarios I mentioned, was definitely worth the shot.

 

The first thing you should consider is that noSQL as a whole is meant for large volumes of data that won't lead to someone's death if a few records are inconsistent. I know it's a hard thing to imagine, but it's a thing that happens.

Awesome! I feel like I should get these words framed in my living room. Maybe I actually will. :)

Also, using Duke's answer, large scale scenarios lead to multiple databases running for multiple purposes. Fast changing data and schemas, like analytical processes, can have a huge gain when using this kind of structure, since new data can be loaded and used faster than when using a RMDB.

Ah, I see. I now remember someone saying that if they wanted to store analytics data, they'd use MongoDB, and it kind of makes sense now. So the benefit is that when we have to rename (columns?), we can do that immediately without having to wait for table locks and all? Also, why does changing schema emerge naturally in analytics? Also, what about all the checks the code has to perform to insert/retrieve this data?

 

Specially when considering data science, when you work with predictions of any kind, your aim is to minimize the error in an efficient way. That means you can't simply decide that to create the best model you're going to analyse every piece of data available.

What you do is little experiments, grabbing all the available data you have, which sounds great in theory, but in practice you're bound to go back and fetch a new data source every now and then, for multiple reasons, including websites' downtime, legacy systems that needed to provide endpoints to extract the data, data governance policies that delayed the data extraction, etc.

With all that going on, you're left with an iterative approach that changes the schema indefinitely.

Just a note, I'm currently using Postgres for an analytical experiment, and it's working just fine, specially because I've limited the scope I'm working with (because I don't have the time to extract any other data), which leads to little schema changes, which makes it great, so again, no silver bullet, always depends on your scenario :) And holy crap I'm loving that hashtag hahaha

 

Really interesting to see the comments. People thinking that SQL databases bring problems, where there were built to solve problems, decades ago, such as: consistency, backups, numeric datatypes, multi-purpose model, flexibility, performance, easiness,... Something was probably completely messed-up in the Dev - DBA communication (and I admit it is mostly because of some DBAs attitude) for such misunderstanding.
Result: we are back to technologies from the 80's with unstructured, hierarchical, scattered, record-based storage and 3rd generation languages. Impossible to admin, monitor and backup, with databases of different technologies spread everywhere. Impossible to scale without throwing many cores and servers, with all the wasted CPU cycles to synchronize that.

A good read is "A Relational Model of Data for Large Shared Data Banks E. F. CODD" introduction. Relational and SQL were build to solve those problems thanks to abstraction of internal representation (logical view), flexible structure (DML and DDL),...

Interesting also to read the reasons why CERN (currently Petabytes of timeseries and logs all in relational database) chose relational database 40 years ago: easy access and evolution thanks to simple tables model. This choice has proven to be the right one for decades of growing data.

I'll not take all examples in those comments one by one, but read about SQL views, union all queries, online DDL, virtual columns,... With all those, the structure can evolve easily, online, on terabytes tables. The "UPDATE user SET name_crc=CRC32(name)" mentioned in the first comment is a really easy one. You don't need to physically change all rows. And even if you change them, just look at the CPU cycles and compare to the one-by-one document approach.

Reading at the comments here, I'm sorry that the relationship with your DBAs was so bad that it leads to a total rejection of all SQL databases. SQL is a DevOps language: same language for admin tasks, modeling, development tasks, and end-user ad-hoc queries. SQL databases are there to ease the development and evolution of database applications, with guaranteed persistence (backups) and sharing (ACID) of data - even for terabytes, and manipulated by thousand of concurrent users.

 

Yeah I never really got into MongoDB, nor Ruby. They just seemed like they were hyped by people that I didn't really like that much, so I just ignored them. I don't know why I did it, I just felt that it was not useful. Everything in the world is relational and RDBMS are really fast, so if you need non-relational data, just store things in tables without any keys ;) job done!

Then I found some horror stories, it seemed to validate my previous beliefs. Ruby was a memory hog and Rails even more so, the more I read about it, the more I realised that I'm not learning anything useful by going in that direction. So I just didn't.

Everybody hates PHP, I hate PHP too. But what I like about it is I can put it behind phpfpm/nginx and host it in a docker container, mount a volume and hack in the editor and see the results. It's fast, well understood, you can abstract away the pain.

About MongoDB. I just stuck with postgres and mysql and gave it a rest. I'm really interested to get into Kafka because I want to learn more about event sourcing and materialized views. But that's driven by the idea that event sourcing provides a complete audit trail of db modifications without the need to write this auditing myself. I felt it would be a really nice way to deal with CRM data for example. You'd see every modification and who did it automatically over time and be able to represent those changes as a stream of events.

 

Good discussion. I recently used MongoDB (in fact any NoSQL DB) for a project for the first time, after 20yrs of RDBMS (mostly MySQL & MariaDB).

Why did I do that?

The project was to pull down data from a public resource over a rate limited API. Eventually about 8 million "documents" avg size with arrays of subdocs about 10kB, ie ~80GB of data. Then to clean, restructure, summarise, query, analyse, visualise and report on that data.

The data came down as JSON, obviously. Schema was complex (ie a lot of fields) and not super consistent (some of it is "legacy" at their end). Data is not "very relational", only about "3-4 main entity types". I wanted to get going quickly and not spend months trying to map the JSON to an SQL schema, only to find that as I pull down more data, I would have to deal with a never ending set of edge cases/legacy exceptions. Data hardly changed (it's a historic "archive" of official records), and just gets added to. NB: I am not in control of the schema of the public resources here, only of the post-processed fields.

So I decided that it might be much easier to get going by just taking the JSON and insert it directly into MongoDB documents. Then indexing it, cleaning it, summarising it (adding yet more fields into the documents), analysing ...etc.

It's worked quite well.ย I found the MongoDB query and indexing tools pretty good at dealing with the JSON - after an initial learning curve. Much better than what I would have got with a MariaDB JSON object field.

I am just running a single mongod instance, not sharding, or even replicating yet. I have found that I need more memory than I expected. Running queries off the disk is painfully slow when none of the 20+ indexes I have will do for the query at hand. Somehow collection scans are slower than I would expect a table scan to be in an RDBMS. Makes sense due to the variable/complex document structure?

The rest of the app uses an RDBMS, because code existed that expects one.

Was this a bad decision? I am not sure yet, jury is out. It allowed me to do a lot quite quickly. Have I incurred a "technical debt"? Not that know of, yet. I have had to do some splitting of the main collection to keep RAM usage in check, see here:

stackoverflow.com/questions/567340...

Could this have been done with an RDBMS? Sure. Would it have been slower to develop? Probably, yes. Would the end product have been better with an RDBMS? Not that I can see right now (notwithstanding above concerns).

By the way I am currently looking into adding a GraphDB layer over the top to help with some of the analysis.

Sorry, maybe that wasn't passionate or opinionated enough? ;-)

So is this a valid use case for MongoDB/NoSQL? I would say: Yes, probably.

Other opinions?

 

This made me laugh so much. I feel the same way about MongoDB. What did it for me and mongo was maintaining data integrity and relationships. MySQL does that for free, so I switched to MySQL and won't be looking back for now.

 

Great article. Definitely agree that MongoDB is awesome for prototypes, but I don't believe it goes beyond that. The fact that work will still need to be done when migrating a (collection I believe their called), IE adding fields and such will still exist.

 

To be fair, this article was written asking for a perspective, not offering one. And funnily enough, since then I've kind of become convinced that MongoDB has solid use cases. Anyway, I personally find no use for it; maybe someday I will. :-)

 

This was one of the most entertaining reads of the week! :D

 

Thank you, David! But I'd invite you to take sides (there's no fun without sparring!). Are you with or against MongoDB? :)

 

Alright then :). I take the side against noSQL solutions. Why so: SQL (and relational by extension) is over 30 years old, battle hardened and proven to work (even at web scale). Whereas noSQL is a relatively new and already loosing traction. It can, it saw, it failed to actually SOLVE the non-existant problems that SQL has not. Bonus: most modern SQL engines have native JSON storage ability. Boom, JSON document stores.

 

Yes, sir! The entertainment was intentional. ๐Ÿ˜Š

 

1 and 5 descend from the same old assumption: every single piece of data is its own small miracle. That is sooo 90's, when everybody wanted to be MicroXerOraclAppIbm-SAP, and BPM/CRM/Office was everything a commercial software could ever dream to be. Loosing an invoice would be DEATH. We were that naive then.
And then came the late 2010's. Everybody was using moderns statistics and wanted to be the next AmaFaceGoogUberFlix-fy. Incomplete, incorrect, out-of-standard "broken" strings of bytes were now called "edge-case data-points". If your application is based on large amounts of thing business once called "data crap", you have to wrestle one inconsistency or a billion. There came Mongo. Mongo use-case? Your use case. Except if you are SAP. Then you should use something field-proven, something reliable that the entire world takes for granted. Like smoking in hospitals and CDs.

 

I'm afraid I didn't get much out of this comment.

Loosing an invoice would be DEATH. We were that naive then.

Now, we should just use MongoDB and stop caring about lost invoices? :)

Mongo use-case? Your use case.

What? How? What do I gain by moving on to MongoDB?

 
 

The way I see it is if your data has a fixed schema, the RDMS will do a better job as it will enforce that schema for you. If not, noSQL is probably better. If you're ever thinking an EAV structure is the answer to your problem, try noSQL for that problem instead.

 

Yup, but some examples from your personal experience would really help! :-)

 

I think there is also a misusing issue with both of these technologies. Sometimes I am thinking on a personal app and I ask myself why not using Firebase schema-less, and then the app grows in functionality and I ask myself "crap I should start with an RDBMS" and then flush it down and never think of this idea again. It is kind of frustrating, but in general I know that RDBMS solves a lot of problems, but sometimes you like to just have a simple storage manager like MongoDB, and maybe push it up to your RDBMS for reporting or whatever else. We could use the best of both in my opinion, and in the end, no matter which one you use, if you know the consequences and the pros you will be able to take the best suited technology.

 

Almost exactly my thoughts. I don't understand the rabid popularity of the MEAN stack. is MongoDB just plain, aggressive marketing, or is there some real substance to it? I've heard of very senior developers who choose MongoDB for every projects unless you can convince them otherwise. Which is great, but I wish I could see the advantages clearly. Hence this post. :)

 

The only time I use MongoDB is for quick prototyping on side projects. Even then I prefer storing data on memory, but if it's too much I resort to MongoDB. It's not applicable on actual business projects, but it's a great tool for learning.

 

Thank you. Just to extend the discussion further, I'm now beginning to think the opposite. ๐Ÿ˜‚ I'm going to use MongoDB more and more and see if it bites back. ๐Ÿ˜Š

 

Well, I enjoyed this post and comments section too.
Still I'm a big fan of MongoDB, as it helped us getting some queries almost 20X faster than MySQL.
And the power which comes with the aggregation pipeline can't be phrased in words.
I won't comment on inserts but when there are such cases in which we have to join multiple tables and search in that particular case I have found MongoDB to be much much faster than RDBMS ever can be, by simply embedding the rows of different tables as sub documents in an array.

 

And the power which comes with the aggregation pipeline can't be phrased in words.

Hmmm ... very interesting!

[...] cases in which we have to join multiple tables and search in that particular case I have found MongoDB to be much much faster than RDBMS ever can be [...]

Wow, that's really exciting. I'd say by now I'm more or less convinced to start learning MongoDB more seriously and use it in as many projects as possible.

 

This debut became overhyped, as many technologies out there it may trying to refer to.

If you are starting a NEW project, TODAY, you have no reason to go with SQL DB rather than the one mentioned in this article's TL;DR:
medium.com/@orenyakobi/choosing-be...

 

Yup, I'm now beginning to think that way. But given that MongoDB also offers distributed read locks across documents, I'd say it makes sense for use in financial data also. O.o Unless, there's something critical I'm missing . . .

 

I disagree. I think there are some good use cases for MongoDB.

For example, applications like Strapi where the user can create their own data structures at runtime are a great fit for MongoDB.

Also for Logging, while you could use RDBMS, it will only work if your data structure is fixed. for SaaS applications like ServerDensity responsible of log aggregation of many clients which can have logs in multiple formats, a Schemaless Datasotre like Mongo is perfect.

Activity feeds, Analytics, IoT data and all the cases where the Data Schema is not known in advance or changes very rapidly.

Some important things to remember about RDBMS. Writes and Joins are in general expensive.

I agree with you on the MEAN stack. I think "Mongo and Non relational by default" is probably not the best fit for common use cases.

But one fact that I believe, makes more people go to that route, is SaaS like Firebase, Dynamo DB etc which have a great free tier, thus making then perfect for smaller applications, personal projects or MVPs.

I dont know any "RDBMS as a Service" that is free to use. Services like Amazon RDS or Google Cloud SQL are very expensive.
The only one that is close is Heroku Posgres but its tied to Heroku.

In conclusion, while I believe Non Relational is being abused a little and its not the best fit for many common use cases, Databases like MongoDB definitely have its place and are quite useful in some contexts.

 

A bit late to the discussion, but I have some insight that might be valuable to others.

I spent a year and half implementing data ingestion pipeline backed by mongodb. Data comes in from relational sources (client's data) and this system's job was to store it and then aggregate it and normalize it into a form our product can use. This is a big task because the data we take in can have different schemas depending on the client.

The way I ended up building this system was essentially creating a kind of instruction document that told the system in each case how to map, aggregate and join the data. There would be one of these instruction documents for each data feed that came into the system, and it was essentially a stored procedure/ query language.

Now, before I talk about the take-aways from this experience I want to say that I was only following instructions... it was my first job after college and didn't really have the experience to see the red flags... but there were certainly red flags.

Anyway... I basically built a database engine on top of mongodb, and it was a very difficult and complicated task. All the data type validation happens in code, as well as all the joining of data, which seems pretty silly. I will be the first to admit that we are using mongo to process relational data and it seems backwards.

However, we can now put all the processing work in a web server which is far cheaper and this new system has the potential to be half the cost running the same load as our current SQLServer system. On top of that, those documents that tell our system how to process the data can be made by users through a web interface, so the import process can be self service. That's like letting your clients write stored procedures for you system, but not insane like that would be. This flexibility comes from doing all that RDBMS stuff in code.

Some of the drawbacks of this design are the insane amount of work it has been to get it working. This project is 10,000+ lines and all it does is aggregate and join data dynamically, depending on the source. I am not sure how many lines of code in stored procedures it is replacing though, but it's no trivial amount.

Also... MongoDB's query syntax sucks. It's so painful to write freaking JSON to filter documents. I miss SQL for that reason alone. That said, most the big nasty queries are in the code so once they are written it's done.

MongoDB probably wasn't the ideal solution for this project (I wonder why a more scalable SQL database wasn't chosen), but I see the reasons why my boss decided to use it. It's going to be much cheaper then spinning up a new SQLServer instance when we add new clients.

 

A bit late to the discussion, but I have some insight that might be valuable to others.

Nah, you're never too late for a MongoDB debate. Ironically, every time I convince myself one way or another, a new argument/experience arises and makes me think. So, I'm extremely thankful you too out the time. ๐Ÿ™‚

this system's job was to store it and then aggregate it and normalize it into a form our product can use

Sounds exactly like the thing MongoDB should excel at, if other people's comments here are anything to go by. ๐Ÿค”

All the data type validation happens in code, as well as all the joining of data, which seems pretty silly.

I wouldn't say that just because something seems silly it is indeed silly.

I wonder why a more scalable SQL database wasn't chosen

Such as? The only offerings that come to mind are CockroachDB, Amazon RDS, etc., but they are prohibitively expensive. Plus, how do you achieve the "self-servicing" uploads, as you say were done so elegantly by MongoDB?

Some of the drawbacks of this design are the insane amount of work it has been to get it working.

๐ŸŽต๐ŸŽต _ Nobody said it was easy ... _ ๐ŸŽต๐ŸŽต (Coldplay style) ๐Ÿ˜›

Also... MongoDB's query syntax sucks.

Well, I can live with that.

All in all, I'd say you've made a very strong case for MongoDB. Thanks for adding to the discussion. ๐Ÿ™ƒ๐Ÿ™ƒ

 

MongoDB uses JavaScript!

Your response to this (the meme) was GOLD. It was kinda what was in my mind when I saw the heading. ๐Ÿคฃ๐Ÿคฃ

 
 

"Code PHP and JS for a living"

... ๐Ÿ™„

Naw just joshing ya'

That was a fun article, reminds me of the dev ops guy at my last company. Thanks for the perspective.

 
 
 

Thank you! The major driving force behind this was fun. And of course, learning something new doesn't hurt either. :)

 

I'm open to being abused, as long as you're able to back it up with something sound

You provided no data, evidence or references to anything that back up your claims

 
 

MongoDB is written in C++, not JavaScript. It does have a JavaScript shell though.

 

To be fair, I said it uses JavaScript. Maybe I got the sense wrong. I just meant to say that you don't need to learn a new syntax as pretty much everything is JavaScript-based.

 

IMHO: main flaw of mongo is it requires indexes fit into memory

 

you just need hits for google analytics. BTW if you need schemas to provide integrity I believe your are writing garbage code.

 

You just need hits for google analytics.

Huh? You mean quick writes, increment counter, and task over? So, I guess under high write loads, relational databases keel over and die?

BTW if you need schemas to provide integrity I believe your are writing garbage code.

But you have to put integrity somewhere, in the schema or in the code.

 
[deleted]
 

I want neither your email nor your arrogant comment. Please delete it.

code of conduct - report abuse