The expectation being that everything should return in seconds and if it doesn't the database must be the issue.
I wonder if throwing more compute power like Spark at data projects will encourage these kind of queries to continue, rather than rewriting them with filters and aggregation to perform better.
Alas, I suspect you have just foretold the next few years of my working life as Spark usage progresses. I like it well enough but trying to be definitive about its actual performance is like trying to work out whether someone walking to the back of a slowly moving bus is actually going forwards or backwards as seen from the street but unsure if you are yourself sitting in a moving train that is inexplicably inside a jet aircraft. (With due apologies to Winston Churchill.)
I wonder if throwing more compute power like Spark at data projects will encourage these kind of queries to continue, rather than rewriting them with filters and aggregation to perform better.
I've faced many issues with this... Developers are using PySpark and they're blindness writing shitty code. That's a huge problem.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
The expectation being that everything should return in seconds and if it doesn't the database must be the issue.
I wonder if throwing more compute power like Spark at data projects will encourage these kind of queries to continue, rather than rewriting them with filters and aggregation to perform better.
Alas, I suspect you have just foretold the next few years of my working life as Spark usage progresses. I like it well enough but trying to be definitive about its actual performance is like trying to work out whether someone walking to the back of a slowly moving bus is actually going forwards or backwards as seen from the street but unsure if you are yourself sitting in a moving train that is inexplicably inside a jet aircraft. (With due apologies to Winston Churchill.)
I've faced many issues with this... Developers are using PySpark and they're blindness writing shitty code. That's a huge problem.