Discussion on: Building a Raspberry Pi Hadoop / Spark Cluster

View post

Thanks for this excellent guide! Haven't tried it yet, but looks remarkably complete and pedagogical.

Some questions:

regarding cost: you give an awesome hardware summary. How much time did you invest? As always, probably the human resource is the most expensive part (given the price of your level of expertise on the free market).
have you measured the performance of this cluster, and can you compare it to realistic real-world scenarios with x86 machines?
have you installed / tried to install Apache Impala on the cluster? Not sure that's "physically" possible, but I would guess you are the authority who can give a final answer. I'm asking, because I'm under the impression that Impala is by far the fastest SQL solution for Hadoop.
Further on SQL: how do queries on your cluster compare in performance to queries on a single big machine with a plain Postgress database (or MariaDB, or whatever)?
Finally, do you think this setup fills an empty spot in computing space for the commercial market? I'm thinking of small companies who want a better data cluster but cannot afford full-fledged big-data solutions.