re: What are the most suitable datastores for storing a huge number of articles and news? VIEW POST

FULL DISCUSSION
 

How massive? Why not a relational DB like PostgreSQL? I'm not saying is the right choice but what was the process of elimination?

You can store huge quantities of data, data analysis can be performed (on the live version or even better on a read replica), full text is supported and if it's too limited you can still use Elasticsearch just for search.

It's definitely easier to setup and handle than Cassandra...

 

How massive? there is no specific number but what I can tell we may be asked to get all what we can get from major newspaper sites like BBC, CNN, ....,etc plus some other blogs and news sites.

Why not PostgreSQL (or relational DB)? Actually, there is no reason and I am currently looking at CitusData as an option. Another option is PostgreSQL-XL.

The concerns to the relational are the size of data, how it easy to scale and add new nodes, and high availability which are provided by NoSql databases by default. That is why we give NoSql DBs a higher periority.

 

How massive? there is no specific number but what I can tell we may be asked to get all what we can get from major newspaper sites like BBC, CNN, ....,etc plus some other blogs and news sites.

I would consider an alternative to PostgreSQL in the hundreds of millions but even then, it depends on what you do with the data :D

Why not PostgreSQL (or relational DB)? Actually, there is no reason and I am currently looking at CitusData as an option.

I've heard about it from some colleagues, check what limitations you have because it's not exactly like PostgreSQL. Just checked the website, they have been purchased by Microsoft eheh

The concerns to the relational are the size of data, how it easy to scale and add new nodes, and high availability which are provided by NoSql databases by default. That is why we give NoSql DBs a higher periority.

Gotcha, obviously keep in mind the tradeoffs.

In any case I would separate the search, due to size requirements, from the "single source of truth" DB

code of conduct - report abuse