I'm curious what you think about the functional API on top of SQL that Spark provides. It complicates the "imperative vs. SQL" framing, because with the DataFrame API you're metaprogramming SQL while still writing Scala/Python/Java. SQL API vs. SQL is the real tough choice from my perspective (and it doesn't have to be a choice, since Spark will support both in the same pipeline). Thanks for the article!
Honestly, anything that makes it easier/simpler to write pipelines is good in my book! (And Spark's SQL APIs definitely count there.)
That said, at least for data that can easily be brought into a single data warehouse, my preference is still to get rid of that extra layer - reducing complexity - and let the highly scalable warehouse do the grunt work.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
I'm curious what you think about the functional API on top of SQL that Spark provides. It complicates the "imperative vs. SQL" framing, because with the DataFrame API you're metaprogramming SQL while still writing Scala/Python/Java. SQL API vs. SQL is the real tough choice from my perspective (and it doesn't have to be a choice, since Spark will support both in the same pipeline). Thanks for the article!
Honestly, anything that makes it easier/simpler to write pipelines is good in my book! (And Spark's SQL APIs definitely count there.)
That said, at least for data that can easily be brought into a single data warehouse, my preference is still to get rid of that extra layer - reducing complexity - and let the highly scalable warehouse do the grunt work.