Discussion on: Things I Learned Building an Analytics Engine

View post

Any feedback for a relational database that scales large and gives quick queries?

You can migrate to MariaDB and scale your DB. It has multiple storage engines, and a master-to-master replication, sharding or the traditional master-slave. With the columnar storage I think it can handle PB of data.

But all the big analytics players uses simpler key-value (columnar) solutions, so they can scale horizontally. Collecting events and running crunching jobs to aggregate and enrich them is better than squeezing performance from a SQL query.

Side-projects are fun for us, devs, the problem arises when we want money out of it. Then all the stuff come that we do not want to handle, from laws to marketing, from customer support to hosting bills.

Doug Black • Oct 21 '18

Thank you! I went with Percona off the bat for this one...it's just soooooo fast.

Tell me more about the key-value solutions! This may be just what I'm looking for!

Adrian B.G. • Oct 21 '18

At an abstract level:

Getting rid of the relationships, and using simple documents, you can shard better, with specific Storages like Cassandra.

Sharding an SQL, most of the times, it requires to get rid of the relationships and Joins. Even if it does not, it will add an overhead because it will query and group data from different shards, in a cascading effect.

If the "sharding" algorithm has to take into consideration data relationships, and wants to keep data as local as possible, then you will have "Hot" spots and unbalanced shards.

I don't say it is impossible to scale SQL, I say that it will be harder and more expensive, if you can afford Spanner from Google or a big setup of Vintess, or 5-8 big servers behind a Galera go ahead!

Bottom line, if you want to go beyond a few TBs of data, I would suggest rethink your structure in a Columnar way, and less SQLish.