loading...

Building Hadoop native libraries on Mac in 2019

zejnilovic profile image Saša Zejnilović Updated on ・5 min read

TL;DR to be found at the end

Recently I came into a situation that I "needed" Hadoop native libraries. Well, when I say "needed", I mean I was just getting fed up by the constant warnings like this one:

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

So I thought I would build my own Hadoop native libraries. How hard can it be, right? Honest answer? Less than an hour if you don't have a tutorial. Fifteen minutes if you do and most of that is compilation time. In my search, I found out a lot of tutorials and guides were either outdated or didn't offer everything needed for a full compilation and installation and that is why I wrote my own which I tested on two independent Macs, thus it should be "tested enough".

Why do it

There was no real world issue I was hoping to solve. I just had a few minutes on my hands and I used them to learn something new. But I did read that there are cases of speed improvements which is good if you are developing or testing something locally because local machines tend to be slow and any improvement is more than welcome. Another thing is I did see two random articles a while back saying they did have some issues with the Java libraries, but chances of some of you having the same issues are really small.

Dependencies

First of all, we need to install the dependencies for the build and I am including links so you can check what you are going to install exactly:

(Please note I am skipping maven, java and others that I think you would already have. If I am wrong, tell me and let's update the article. As well as Hadoop installation. There is a beautiful article about Hadoop installation on Mac by Zhang Hao here.)

For the installation of most of these, I will be using Homebrew. It's a good tool, has a one-liner installation and a very short average time to be productive with it. As the link provides everything you need I am skipping the installation here.

If you are not using Homebrew for the first time, update and upgrade your tools. If you are using it for some time already and would like to keep some things with the current version, use brew pin like this.

# Update
brew update
brew upgrade

# Then the installation
brew install wget gcc autoconf automake libtool cmake snappy gzip bzip2 zlib openssl

As you could have noticed one of those dependencies listed is missing from the list above. Yes! It is a protobuf that has been deprecated and can't be easily installed from Homebrew. So let's build our own. It's cleaner that way and much more fun then it sounds. We will first need to get it from GitHub and unarchive it somewhere. You can delete it right after, so you don't need a special folder structure.

wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
tar -xzf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0

Then comes the process of building and making sure everything went smoothly. It takes some time and I advise you to run it step by step to see and know what is happening. Some warnings here and there are normal so you can skip those.

./configure
make
make check
make install
# And just to check if everything is ok.
# This should print libprotoc 2.5.0
protoc --version

OpenSSL setup

Now, linking OpenSSL libraries by hand as Homebrew refuses to link OpenSSL and the compiler needs them. This is a known feature and needs to be done by running ln.

cd /usr/local/include
ln -s ../opt/openssl/include/openssl .

This will solve an error that looks something like the caption below.

[exec] -- Configuring incomplete, errors occurred!
[exec] See also /Users/user/github/hadoop/hadoop-tools/hadoop-pipes/target/native/CMakeCMake Error at /usr/local/Cellar/cmake/3.14.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
[exec]   Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the
[exec]   system variable OPENSSL_ROOT_DIR (missing: OPENSSL_INCLUDE_DIR)
[exec] Call Stack (most recent call first):
[exec]   /usr/local/Cellar/cmake/3.14.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
[exec]   /usr/local/Cellar/cmake/3.14.3/share/cmake/Modules/FindOpenSSL.cmake:413 (find_package_handle_stFiles/CMakeOutput.log.
[exec] andard_args)
[exec]   CMakeLists.txt:20 (find_package)
[exec]
[exec]

Building native libraries

And finally! The building of the libraries. Again, this will create a folder that you can delete in the end. Here is probably the first place you will need to modify something and that is the version of Hadoop you will be using.

git clone https://github.com/apache/hadoop.git
cd hadoop
# Change the version as needed
git checkout branch-<VERSION>
# And just package.
mvn package -Pdist,native -DskipTests -Dtar
# After build, move your newly created libraries.
cp -R hadoop-dist/target/hadoop-<VERSION>/lib $HADOOP_HOME

Setting up environment variables

Now the critical part, making your shell see the libraries. I don't know what kind of shell you are using, nevertheless, put this into your shell profile (.bashrc, .zshrc, etc.):

export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:${HADOOP_HOME}/lib/native

This will point all the libraries to the right path and will make everything fall right into place. The last thing that we need is just to check if everything is ok (and by everything I mean almost everything, because bzip is acting up and I still have not found a way to solve, when I do I will update this).

hadoop checknative -a

#The output should be something like this.
19/05/17 19:00:14 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
19/05/17 19:00:14 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /usr/local/Cellar/hadoop/2.7.5/lib/native/libhadoop.dylib
zlib:    true /usr/lib/libz.1.dylib
snappy:  true /usr/local/lib/libsnappy.1.dylib
lz4:     true revision:99
bzip2:   false
openssl: true /usr/lib/libcrypto.35.dylib
19/05/17 19:00:14 INFO util.ExitUtil: Exiting with status 1

Afterword

Hopefully, everything is running smoothly and you no longer get those warnings and if I helped even one person with this I am glad. Because if there is no added value for the reader, then it is just me talking to my wall. On the other hand, if you did find some issues in the code or the article, please do tell me and I will fix everything I am capable of.

TL;DR

This is just a step by step shell script extracted from the upper text.

Posted on by:

Discussion

pic
Editor guide
 

Depending on which Hadoop version you want to install, you may need to use an earlier Java version to package it. This can be done by temporarily changing the JAVA_HOME environment variable before running mvn package.

In my case, instead of

mvn package -Pdist,native -DskipTests -Dtar

I had to run

JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home mvn package -Pdist,native -DskipTests -Dtar
 
 

Yes, you are right. I didn't think of that use case, I just assumed people would have a compliant Java version already installed or set as the main one.

 

Thanks for this article. I get this error when compiling to generate package using this below command:

mvn package -Pdist,native -DskipTests -Dtar

branch: branch-3.2

[INFO] Apache Hadoop MapReduce NativeTask ................. FAILURE [ 1.766 s]

[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.2.2-SNAPSHOT:cmake-compile (cmake-compile) on project hadoop-mapreduce-client-nativetask: make failed with error code 2 -> [Help 1]

 

It's because hadoop doesn't support macOS native building now.

See github.com/apache/hadoop/blob/trun...

Note that building Hadoop 3.1.1/3.1.2/3.2.0 native code from source is broken
on macOS. For 3.1.1/3.1.2, you need to manually backport YARN-8622. For 3.2.0,
you need to backport both YARN-8622 and YARN-9487 in order to build native code.
 

Hello, I don't see enough of the message. Do you have java 8? Or maybe more of the error (line with cause by)?

I have tried building this now and it works for me.

 

Yes I have java 8 and that's my java home. Hadoop, hive setup and working though. I read some of the compression codecs works only with native libraries and jobs will fail with java libraries.

I tried this on mac Mojave OS 10.14.6

 

Hi Sasa,
Thank you for this ! I'm a bit new on hadoop and I'm trying to fix the "native library" thing based on your article. Everything is ok until "Apache Hadoop Common" build :

Mac OSX highSierra
Hadoop version 2.9.2

~/hadoop/tmp/hadoop   branch-2.9.2  brew list
autoconf cmake gettext go libidn2 libunistring openjdk pcre2 pyenv-virtualenv sshpass wget
automake direnv git gzip libmpc maven openssl@1.1 pkg-config readline telnet zlib
bzip2 gcc gmp isl libtool mpfr pandoc pyenv snappy tree

openssl : ok
protoc : 2.5.0 ok

Error message :

[INFO] Apache Hadoop Common ............................... FAILURE [02:08 min]

I have :

[WARNING] /Users/ccompain/hadoop/tmp/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpensslCipher.c:256:14: error: incomplete definition of type 'struct evp_cipher_ctx_st'
[WARNING] if (context->flags & EVP_CIPH_NO_PADDING) {
[WARNING] ~~~~~~~^
[WARNING] /usr/local/include/openssl/ossl_typ.h:90:16: note: forward declaration of 'struct evp_cipher_ctx_st'
[WARNING] typedef struct evp_cipher_ctx_st EVP_CIPHER_CTX;
[WARNING] ^
[WARNING] /Users/ccompain/hadoop/tmp/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpensslCipher.c:262:20: error: incomplete definition of type 'struct evp_cipher_ctx_st'
[WARNING] int b = context->cipher->block_size;
[WARNING] ~~~~~~~^
[WARNING] /usr/local/include/openssl/ossl_typ.h:90:16: note: forward declaration of 'struct evp_cipher_ctx_st'
[WARNING] typedef struct evp_cipher_ctx_st EVP_CIPHER_CTX;
[WARNING] ^
[WARNING] /Users/ccompain/hadoop/tmp/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpensslCipher.c:263:16: error: incomplete definition of type 'struct evp_cipher_ctx_st'
[WARNING] if (context->encrypt) {
[WARNING] ~~~~~~~^
[WARNING] /usr/local/include/openssl/ossl_typ.h:90:16: note: forward declaration of 'struct evp_cipher_ctx_st'
[WARNING] typedef struct evp_cipher_ctx_st EVP_CIPHER_CTX;
[WARNING] ^
[WARNING] /Users/ccompain/hadoop/tmp/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpensslCipher.c:310:14: error: incomplete definition of type 'struct evp_cipher_ctx_st'
[WARNING] if (context->flags & EVP_CIPH_NO_PADDING) {
[WARNING] ~~~~~~~^
[WARNING] /usr/local/include/openssl/ossl_typ.h:90:16: note: forward declaration of 'struct evp_cipher_ctx_st'
[WARNING] typedef struct evp_cipher_ctx_st EVP_CIPHER_CTX;
[WARNING] ^
[WARNING] /Users/ccompain/hadoop/tmp/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpensslCipher.c:313:20: error: incomplete definition of type 'struct evp_cipher_ctx_st'
[WARNING] int b = context->cipher->block_size;
[WARNING] ~~~~~~~^
[WARNING] /usr/local/include/openssl/ossl_typ.h:90:16: note: forward declaration of 'struct evp_cipher_ctx_st'
[WARNING] typedef struct evp_cipher_ctx_st EVP_CIPHER_CTX;
[WARNING] ^
[WARNING] 5 errors generated.

[WARNING] 5 errors generated.
[WARNING] make[2]: *** [CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/crypto/OpensslCipher.c.o] Error 1
[WARNING] make[2]: *** Waiting for unfinished jobs....
[WARNING] make[1]: *** [CMakeFiles/hadoop_static.dir/all] Error 2
[WARNING] make[1]: *** Waiting for unfinished jobs....
[WARNING] make[1]: *** [CMakeFiles/hadoop.dir/all] Error 2
[WARNING] make: *** [all] Error 2

Any idea ?
Thks,
Christophe

 

Hello, in all honesty, I don't remember having this problem, but seems like you aren't alone. Apparently it is an OpenSSL feature. This apache.org Jira seems to be talking precisely about your problem. Seems like they even provide places you need to fix in the C code.

 

What version of OpenSSL did you use?

I used OpenSSL 1.1 from Brew and I have this error

error: variable has incomplete type 'HMAC_CTX' (aka 'hmac_ctx_st')
 

This seems more like an issue with the OpenSSL installation than the version you are using. Anyway, this is my current PC (not sure with which I have built it, but libs work)

╰─$ openssl version
LibreSSL 2.8.3
╰─$ brew list --versions | grep ssl
openssl 1.0.2t
openssl@1.1 1.1.1d
 

According to
issues.apache.org/jira/browse/HADO...

OpenSSL 1.1 broke the compilation. They patched it but didn't include it to the version 2 build. Your tutorial used OpenSSL 1.0 (Open SSL 1.1 will have OpenSSL@1.1 on the path)

Too bad that Homebrew already deprecated OpenSSL 1.0

Thank you very much! Will update the post.

This issue discusses how to forcefully install OpenSSL 1.0 using Homebrew
github.com/Homebrew/homebrew-core/...

 

Thanks for writing this. Helped me a lot

 

When is $HADOOP_HOME defined?

when i run line 29 in script it fails and prints out the usage text for cp

 

$HADOOP_HOME is not defined by me. It is part of proper Hadoop installation and I am not doing that in this article.