DEV Community

[Comment from a deleted post]
 
helenanders26 profile image
Helen Anderson

The expectation being that everything should return in seconds and if it doesn't the database must be the issue.

I wonder if throwing more compute power like Spark at data projects will encourage these kind of queries to continue, rather than rewriting them with filters and aggregation to perform better.

 
geraldew profile image
geraldew

Alas, I suspect you have just foretold the next few years of my working life as Spark usage progresses. I like it well enough but trying to be definitive about its actual performance is like trying to work out whether someone walking to the back of a slowly moving bus is actually going forwards or backwards as seen from the street but unsure if you are yourself sitting in a moving train that is inexplicably inside a jet aircraft. (With due apologies to Winston Churchill.)

 
mx profile image
Maxime Moreau

I wonder if throwing more compute power like Spark at data projects will encourage these kind of queries to continue, rather than rewriting them with filters and aggregation to perform better.

I've faced many issues with this... Developers are using PySpark and they're blindness writing shitty code. That's a huge problem.