The expectation being that everything should return in seconds and if it does...

[Comment from a deleted post]

Replies for: Billon row table joins lol 😂

The expectation being that everything should return in seconds and if it doesn't the database must be the issue.

I wonder if throwing more compute power like Spark at data projects will encourage these kind of queries to continue, rather than rewriting them with filters and aggregation to perform better.

geraldew • Feb 18 '20

Alas, I suspect you have just foretold the next few years of my working life as Spark usage progresses. I like it well enough but trying to be definitive about its actual performance is like trying to work out whether someone walking to the back of a slowly moving bus is actually going forwards or backwards as seen from the street but unsure if you are yourself sitting in a moving train that is inexplicably inside a jet aircraft. (With due apologies to Winston Churchill.)

Maxime Moreau • Mar 11 '20

I wonder if throwing more compute power like Spark at data projects will encourage these kind of queries to continue, rather than rewriting them with filters and aggregation to perform better.

I've faced many issues with this... Developers are using PySpark and they're blindness writing shitty code. That's a huge problem.