Thank you for this superb article. I have been following it to deploy a Hadoop/Spark cluster using the latest Raspberry Pi 4 (4GB). I encountered one problem, which was that after completing the tutorial, the spark job was not being assigned. I got a warning:
INFO yarn.Client: Requesting a new application from cluster with 0 NodeManagers and then it sort of got stuck on
INFO yarn.Client: Application report for application_1564655698306_0001 (state: ACCEPTED). I will describe later how I solved this.
First, I want to note that the latest Raspbian version (Buster) does not include Oracle Java 8 which is required by Hadoop 3.2.0. There is no easy way to get it set-up, but it can be done. First you need to manually download the tar.gz file from Oracle's site (this requires a registration). I put it up on a personal webserver so it can be easily downloaded from the Pis. Then, on each Pi:
download java package
cd ~/Downloads
wget /jdk8.tar.gz
extract package contents
sudo mkdir /usr/java
cd /usr/java
sudo tar xf ~/Downloads/jdk8.tar.gz
Next, here is how I solved the YARN problem. In your tutorial section "Configuring Hadoop on the Cluster", after the modifications to the xml files have been made on Pi1, two files need to be copied across to the other Pis: these are yarn-site.xml and mapred-site.xml. After copying, YARN needs to be restarted on Pi1.
-c number of cores you have for each node
-m amount of memory you have for each node (Giga)
-d number of disk you have for each node
-bool "True" if HBase is installed; "False" if not
This should provide appropriate settings to use. After the xml files have been edited and YARN has been restarted, you can try this command to check that all the worker nodes are active.
First of all i d like to thank Andrew for a superb tutorial. Besides some minor alternation i had to make, i was able to set up the hdfs etc. but i am running now on the same problem as you Andreas.
The first thing i d like to add to your recommendations is that downloading the java is easier. sudo apt-get install openjdk-8-jdk
and then change the default (as you suggested already): sudo update-alternatives --config java sudo update-alternatives --config javac
Then change export JAVA_HOME=$(readlink –f /usr/bin/java | sed "s:bin/java::") to export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-armhf both in ~/.bashrc and in /opt/hadoop/etc/hadoop/hadoop-env.sh.
The part i have been stuck for a while though is that yarn node -list command stucks
and if i try to run a spark job then i also get stuck on the ACCEPTED part.
I haven't yet tried your proposition.
PS I know it is a year-old article but still is the best i ve seen so far in my research.
Ευχαριστώ πολύ και τους 2 (I would like to thank you both)
I am running Raspbian Buster on my PIs, too. I have downloaded the "Linux ARM 64 Hard Float ABI" (jdk-8u231-linux-arm64-vfp-hflt.tar.gz) and followed your instructions and I get following error bu running java -version
-bash: /usr/bin/java: cannot execute binary file: Exec format error
I guess this java-product is not compatible with the PI. Which exact file have you downloaded from the Orcale site?
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Thank you for this superb article. I have been following it to deploy a Hadoop/Spark cluster using the latest Raspberry Pi 4 (4GB). I encountered one problem, which was that after completing the tutorial, the spark job was not being assigned. I got a warning:
INFO yarn.Client: Requesting a new application from cluster with 0 NodeManagers and then it sort of got stuck on
INFO yarn.Client: Application report for application_1564655698306_0001 (state: ACCEPTED). I will describe later how I solved this.
First, I want to note that the latest Raspbian version (Buster) does not include Oracle Java 8 which is required by Hadoop 3.2.0. There is no easy way to get it set-up, but it can be done. First you need to manually download the tar.gz file from Oracle's site (this requires a registration). I put it up on a personal webserver so it can be easily downloaded from the Pis. Then, on each Pi:
download java package
cd ~/Downloads
wget /jdk8.tar.gz
extract package contents
sudo mkdir /usr/java
cd /usr/java
sudo tar xf ~/Downloads/jdk8.tar.gz
update alternative configurations
sudo update-alternatives --install /usr/bin/java java /usr/java/jdk1.8.0_221/bin/java 1000
sudo update-alternatives --install /usr/bin/javac javac /usr/java/jdk1.8.0_221/bin/javac 1000
select desired java version
sudo update-alternatives --config java
check the java version changes
java -version
Next, here is how I solved the YARN problem. In your tutorial section "Configuring Hadoop on the Cluster", after the modifications to the xml files have been made on Pi1, two files need to be copied across to the other Pis: these are yarn-site.xml and mapred-site.xml. After copying, YARN needs to be restarted on Pi1.
To set appropriate values for the memory settings, I found a useful tool which is described on this thread stackoverflow.com/questions/495791...
Copy-pasting the instructions:
get the tool
wget public-repo-1.hortonworks.com/HDP/...
tar zxvf hdp_manual_install_rpm_helper_files-2.6.0.3.8.tar.gz
rm hdp_manual_install_rpm_helper_files-2.6.0.3.8.tar.gz
mv hdp_manual_install_rpm_helper_files-2.6.0.3.8/ hdp_conf_files
run the tool
python hdp_conf_files/scripts/yarn-utils.py -c 4 -m 8 -d 1 false
-c number of cores you have for each node
-m amount of memory you have for each node (Giga)
-d number of disk you have for each node
-bool "True" if HBase is installed; "False" if not
This should provide appropriate settings to use. After the xml files have been edited and YARN has been restarted, you can try this command to check that all the worker nodes are active.
yarn node -list
First of all i d like to thank Andrew for a superb tutorial. Besides some minor alternation i had to make, i was able to set up the hdfs etc. but i am running now on the same problem as you Andreas.
The first thing i d like to add to your recommendations is that downloading the java is easier.
sudo apt-get install openjdk-8-jdk
and then change the default (as you suggested already):
sudo update-alternatives --config java
sudo update-alternatives --config javac
Then change export JAVA_HOME=$(readlink –f /usr/bin/java | sed "s:bin/java::") to export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-armhf both in ~/.bashrc and in /opt/hadoop/etc/hadoop/hadoop-env.sh.
The part i have been stuck for a while though is that yarn node -list command stucks
and if i try to run a spark job then i also get stuck on the ACCEPTED part.
I haven't yet tried your proposition.
PS I know it is a year-old article but still is the best i ve seen so far in my research.
Ευχαριστώ πολύ και τους 2 (I would like to thank you both)
Hi Andreas,
I am running Raspbian Buster on my PIs, too. I have downloaded the "Linux ARM 64 Hard Float ABI" (jdk-8u231-linux-arm64-vfp-hflt.tar.gz) and followed your instructions and I get following error bu running java -version
I guess this java-product is not compatible with the PI. Which exact file have you downloaded from the Orcale site?