DEV Community

Discussion on: Building a Raspberry Pi Hadoop / Spark Cluster

Collapse
 
komis1 profile image
Andreas Komninos

Thank you for this superb article. I have been following it to deploy a Hadoop/Spark cluster using the latest Raspberry Pi 4 (4GB). I encountered one problem, which was that after completing the tutorial, the spark job was not being assigned. I got a warning:
INFO yarn.Client: Requesting a new application from cluster with 0 NodeManagers and then it sort of got stuck on
INFO yarn.Client: Application report for application_1564655698306_0001 (state: ACCEPTED). I will describe later how I solved this.

First, I want to note that the latest Raspbian version (Buster) does not include Oracle Java 8 which is required by Hadoop 3.2.0. There is no easy way to get it set-up, but it can be done. First you need to manually download the tar.gz file from Oracle's site (this requires a registration). I put it up on a personal webserver so it can be easily downloaded from the Pis. Then, on each Pi:

download java package

cd ~/Downloads
wget /jdk8.tar.gz

extract package contents

sudo mkdir /usr/java
cd /usr/java
sudo tar xf ~/Downloads/jdk8.tar.gz

update alternative configurations

sudo update-alternatives --install /usr/bin/java java /usr/java/jdk1.8.0_221/bin/java 1000
sudo update-alternatives --install /usr/bin/javac javac /usr/java/jdk1.8.0_221/bin/javac 1000

select desired java version

sudo update-alternatives --config java

check the java version changes

java -version

Next, here is how I solved the YARN problem. In your tutorial section "Configuring Hadoop on the Cluster", after the modifications to the xml files have been made on Pi1, two files need to be copied across to the other Pis: these are yarn-site.xml and mapred-site.xml. After copying, YARN needs to be restarted on Pi1.

To set appropriate values for the memory settings, I found a useful tool which is described on this thread stackoverflow.com/questions/495791...

Copy-pasting the instructions:

get the tool

wget public-repo-1.hortonworks.com/HDP/...
tar zxvf hdp_manual_install_rpm_helper_files-2.6.0.3.8.tar.gz
rm hdp_manual_install_rpm_helper_files-2.6.0.3.8.tar.gz
mv hdp_manual_install_rpm_helper_files-2.6.0.3.8/ hdp_conf_files

run the tool

python hdp_conf_files/scripts/yarn-utils.py -c 4 -m 8 -d 1 false

-c number of cores you have for each node
-m amount of memory you have for each node (Giga)
-d number of disk you have for each node
-bool "True" if HBase is installed; "False" if not

This should provide appropriate settings to use. After the xml files have been edited and YARN has been restarted, you can try this command to check that all the worker nodes are active.

yarn node -list

Collapse
 
sliver88 profile image
Sliver88 • Edited

First of all i d like to thank Andrew for a superb tutorial. Besides some minor alternation i had to make, i was able to set up the hdfs etc. but i am running now on the same problem as you Andreas.
The first thing i d like to add to your recommendations is that downloading the java is easier.
sudo apt-get install openjdk-8-jdk
and then change the default (as you suggested already):
sudo update-alternatives --config java
sudo update-alternatives --config javac

Then change export JAVA_HOME=$(readlink –f /usr/bin/java | sed "s:bin/java::") to export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-armhf both in ~/.bashrc and in /opt/hadoop/etc/hadoop/hadoop-env.sh.

The part i have been stuck for a while though is that yarn node -list command stucks
and if i try to run a spark job then i also get stuck on the ACCEPTED part.
I haven't yet tried your proposition.

PS I know it is a year-old article but still is the best i ve seen so far in my research.

Ευχαριστώ πολύ και τους 2 (I would like to thank you both)

Collapse
 
pidevi profile image
PiDevi • Edited

Hi Andreas,

I am running Raspbian Buster on my PIs, too. I have downloaded the "Linux ARM 64 Hard Float ABI" (jdk-8u231-linux-arm64-vfp-hflt.tar.gz) and followed your instructions and I get following error bu running java -version

-bash: /usr/bin/java: cannot execute binary file: Exec format error

I guess this java-product is not compatible with the PI. Which exact file have you downloaded from the Orcale site?