To my mind, a processing pipeline is anything that reads data from a number of source(s), joins/transforms/filters those data, and outputs the results to some number of destination(s). (Note that it is rare, but occasionally the output destination is the same as the input source.) So I would say both of your examples would qualify.
I wasn't familiar with Blaze, but having had a quick look, it does look like I am suggesting a similar approach, but indeed just going straight to SQL instead.
Actually when you define processing pipeline as "anything that reads data from a number of source(s), joins/transforms/filters those data, and outputs the results to some number of destination(s)."
Thanks for writing this. I’m interested to learn more.
expressing pipelines
First the term “expressing pipelines” I am not sure if I understand that term completely.
Isn’t it a case of say running queries to extract data for reports? Meaning pure read only.
Does it also include the case of extracting data from a data source and then inputting in another source? I.e. read and write
difference between running sql pipelines and something like blaze
Are you familiar with github.com/blaze/blaze?
So their philosophy is that as data grows bigger it’s easy to send code to data than data to code for processing.
Conceptually are you suggesting the same thing? Except you recommend directly using sql queries
Thanks
Thanks for reading my post!
To my mind, a processing pipeline is anything that reads data from a number of source(s), joins/transforms/filters those data, and outputs the results to some number of destination(s). (Note that it is rare, but occasionally the output destination is the same as the input source.) So I would say both of your examples would qualify.
I wasn't familiar with Blaze, but having had a quick look, it does look like I am suggesting a similar approach, but indeed just going straight to SQL instead.
Actually when you define processing pipeline as "anything that reads data from a number of source(s), joins/transforms/filters those data, and outputs the results to some number of destination(s)."
You're talking essentially about ETL right?
More or less, yes!