re: MongoDB has no use case VIEW POST

FULL DISCUSSION
 

As with lots of other non-relational databases, as well as plenty of big data technologies, trying to apply them to your regular transactional problems leads to nothing, since they're not meant for that.

The first thing you should consider is that noSQL as a whole is meant for large volumes of data that won't lead to someone's death if a few records are inconsistent. I know it's a hard thing to imagine, but it's a thing that happens.

Also, using Duke's answer, large scale scenarios lead to multiple databases running for multiple purposes. Fast changing data and schemas, like analytical processes, can have a huge gain when using this kind of structure, since new data can be loaded and used faster than when using a RMDB.

I've used MongoDB (and other document databases) for fairly small personal projects, but the development speed, specially considering the analytical scenarios I mentioned, was definitely worth the shot.

 

The first thing you should consider is that noSQL as a whole is meant for large volumes of data that won't lead to someone's death if a few records are inconsistent. I know it's a hard thing to imagine, but it's a thing that happens.

Awesome! I feel like I should get these words framed in my living room. Maybe I actually will. :)

Also, using Duke's answer, large scale scenarios lead to multiple databases running for multiple purposes. Fast changing data and schemas, like analytical processes, can have a huge gain when using this kind of structure, since new data can be loaded and used faster than when using a RMDB.

Ah, I see. I now remember someone saying that if they wanted to store analytics data, they'd use MongoDB, and it kind of makes sense now. So the benefit is that when we have to rename (columns?), we can do that immediately without having to wait for table locks and all? Also, why does changing schema emerge naturally in analytics? Also, what about all the checks the code has to perform to insert/retrieve this data?

 

Specially when considering data science, when you work with predictions of any kind, your aim is to minimize the error in an efficient way. That means you can't simply decide that to create the best model you're going to analyse every piece of data available.

What you do is little experiments, grabbing all the available data you have, which sounds great in theory, but in practice you're bound to go back and fetch a new data source every now and then, for multiple reasons, including websites' downtime, legacy systems that needed to provide endpoints to extract the data, data governance policies that delayed the data extraction, etc.

With all that going on, you're left with an iterative approach that changes the schema indefinitely.

Just a note, I'm currently using Postgres for an analytical experiment, and it's working just fine, specially because I've limited the scope I'm working with (because I don't have the time to extract any other data), which leads to little schema changes, which makes it great, so again, no silver bullet, always depends on your scenario :) And holy crap I'm loving that hashtag hahaha

code of conduct - report abuse