re: MongoDB has no use case VIEW POST

FULL DISCUSSION
 

A bit late to the discussion, but I have some insight that might be valuable to others.

I spent a year and half implementing data ingestion pipeline backed by mongodb. Data comes in from relational sources (client's data) and this system's job was to store it and then aggregate it and normalize it into a form our product can use. This is a big task because the data we take in can have different schemas depending on the client.

The way I ended up building this system was essentially creating a kind of instruction document that told the system in each case how to map, aggregate and join the data. There would be one of these instruction documents for each data feed that came into the system, and it was essentially a stored procedure/ query language.

Now, before I talk about the take-aways from this experience I want to say that I was only following instructions... it was my first job after college and didn't really have the experience to see the red flags... but there were certainly red flags.

Anyway... I basically built a database engine on top of mongodb, and it was a very difficult and complicated task. All the data type validation happens in code, as well as all the joining of data, which seems pretty silly. I will be the first to admit that we are using mongo to process relational data and it seems backwards.

However, we can now put all the processing work in a web server which is far cheaper and this new system has the potential to be half the cost running the same load as our current SQLServer system. On top of that, those documents that tell our system how to process the data can be made by users through a web interface, so the import process can be self service. That's like letting your clients write stored procedures for you system, but not insane like that would be. This flexibility comes from doing all that RDBMS stuff in code.

Some of the drawbacks of this design are the insane amount of work it has been to get it working. This project is 10,000+ lines and all it does is aggregate and join data dynamically, depending on the source. I am not sure how many lines of code in stored procedures it is replacing though, but it's no trivial amount.

Also... MongoDB's query syntax sucks. It's so painful to write freaking JSON to filter documents. I miss SQL for that reason alone. That said, most the big nasty queries are in the code so once they are written it's done.

MongoDB probably wasn't the ideal solution for this project (I wonder why a more scalable SQL database wasn't chosen), but I see the reasons why my boss decided to use it. It's going to be much cheaper then spinning up a new SQLServer instance when we add new clients.

 

A bit late to the discussion, but I have some insight that might be valuable to others.

Nah, you're never too late for a MongoDB debate. Ironically, every time I convince myself one way or another, a new argument/experience arises and makes me think. So, I'm extremely thankful you too out the time. 🙂

this system's job was to store it and then aggregate it and normalize it into a form our product can use

Sounds exactly like the thing MongoDB should excel at, if other people's comments here are anything to go by. 🤔

All the data type validation happens in code, as well as all the joining of data, which seems pretty silly.

I wouldn't say that just because something seems silly it is indeed silly.

I wonder why a more scalable SQL database wasn't chosen

Such as? The only offerings that come to mind are CockroachDB, Amazon RDS, etc., but they are prohibitively expensive. Plus, how do you achieve the "self-servicing" uploads, as you say were done so elegantly by MongoDB?

Some of the drawbacks of this design are the insane amount of work it has been to get it working.

🎵🎵 _ Nobody said it was easy ... _ 🎵🎵 (Coldplay style) 😛

Also... MongoDB's query syntax sucks.

Well, I can live with that.

All in all, I'd say you've made a very strong case for MongoDB. Thanks for adding to the discussion. 🙃🙃

code of conduct - report abuse