re: MongoDB has no use case VIEW POST

FULL DISCUSSION
 

Good discussion. I recently used MongoDB (in fact any NoSQL DB) for a project for the first time, after 20yrs of RDBMS (mostly MySQL & MariaDB).

Why did I do that?

The project was to pull down data from a public resource over a rate limited API. Eventually about 8 million "documents" avg size with arrays of subdocs about 10kB, ie ~80GB of data. Then to clean, restructure, summarise, query, analyse, visualise and report on that data.

The data came down as JSON, obviously. Schema was complex (ie a lot of fields) and not super consistent (some of it is "legacy" at their end). Data is not "very relational", only about "3-4 main entity types". I wanted to get going quickly and not spend months trying to map the JSON to an SQL schema, only to find that as I pull down more data, I would have to deal with a never ending set of edge cases/legacy exceptions. Data hardly changed (it's a historic "archive" of official records), and just gets added to. NB: I am not in control of the schema of the public resources here, only of the post-processed fields.

So I decided that it might be much easier to get going by just taking the JSON and insert it directly into MongoDB documents. Then indexing it, cleaning it, summarising it (adding yet more fields into the documents), analysing ...etc.

It's worked quite well. I found the MongoDB query and indexing tools pretty good at dealing with the JSON - after an initial learning curve. Much better than what I would have got with a MariaDB JSON object field.

I am just running a single mongod instance, not sharding, or even replicating yet. I have found that I need more memory than I expected. Running queries off the disk is painfully slow when none of the 20+ indexes I have will do for the query at hand. Somehow collection scans are slower than I would expect a table scan to be in an RDBMS. Makes sense due to the variable/complex document structure?

The rest of the app uses an RDBMS, because code existed that expects one.

Was this a bad decision? I am not sure yet, jury is out. It allowed me to do a lot quite quickly. Have I incurred a "technical debt"? Not that know of, yet. I have had to do some splitting of the main collection to keep RAM usage in check, see here:

stackoverflow.com/questions/567340...

Could this have been done with an RDBMS? Sure. Would it have been slower to develop? Probably, yes. Would the end product have been better with an RDBMS? Not that I can see right now (notwithstanding above concerns).

By the way I am currently looking into adding a GraphDB layer over the top to help with some of the analysis.

Sorry, maybe that wasn't passionate or opinionated enough? ;-)

So is this a valid use case for MongoDB/NoSQL? I would say: Yes, probably.

Other opinions?

code of conduct - report abuse