DEV Community


Discussion on: PostgreSQL vs MongoDB

daveparr profile image
Dave Parr

I sort of agree. But also, having had to work with both PostGRES and Mongo in the same company, on the same product, as a Data Scientist, my experience was more: "Do I actually have to go to the Mongo DB in order to get [business report]? Can't we work around it by: [getting data from api/event bus]".

The reason is that most data science work relies on relations between data. We're not looking to find the 'one record that fits these criteria', we're trying to encapsulate the full scope of nearly all the data. I don't think I've ever done a job without needing 50-99% of all the records in a specific table in relation to 1-10 others. Doing that kind of work on Mongo is sort of a pain.

The other aspect is most data science processes focus on matrix like objects. Python tends towards Pandas/ numpy arrays for object representation, which are relatively analogous to a table in a database, and R has this kind of feature built into the base language with data.frames. Most ML will assume you pass it table like records.

There may be an argument that human written, free text based NLP might be a good fit. However, you will probably start an NLP still with tables of the frequency of the word turning up like in this project by by buddy Dom where he predicts Gross price of theatre tickets by doing NLP on the reviews of different shows

Forem Open with the Forem app