DEV Community

Kevin Risden
Kevin Risden

Posted on • Originally published at on

Apache Livy - Apache Spark, HDFS, and Kerberos


Apache Livy provides a REST interface for interacting with Apache Spark. When using Apache Spark to interact with Apache Hadoop HDFS that is secured with Kerberos, a Kerberos token needs to be obtained. This tends to pose some issues due to token delegation.

spark-submit provides a solution to this by getting a delegation token on your behalf when the job is submitted. For this to work, Hadoop configurations and JAR files must be on the spark-submit classpath. Specifically which configurations an JAR files are explained in a references here.

When using Livy with HDP, the Hadoop JAR files and configurations are already on the classpath for spark-submit. This means there is nothing special required to read/write to HDFS with a Spark job submitted through Livy.

If you are looking to do something similar with Apache HBase see this post.


curl \
  -u ${USER} \
  --location-trusted \
  -H 'X-Requested-by: livy' \
  -H 'Content-Type: application/json' \
  -X POST \
  https://localhost:8443/gateway/default/livy/v1/batches \
  --data "{
    \"proxyUser\": \"${USER}\",
    \"file\": \"hdfs:///user/${USER}/SparkHDFSKerberos.jar\",
    \"className\": \"SparkHDFSKerberos\",
    \"args\": [
Enter fullscreen mode Exit fullscreen mode


import org.apache.spark.SparkConf;

public class SparkHDFSKerberos {
  public static void main(String[] args) {
    SparkConf sparkConf = new SparkConf().setAppName(SparkHDFSKerberos.class.getCanonicalName());
    JavaSparkContext jsc = new JavaSparkContext(sparkConf);

    JavaRDD<String> textFile = jsc.textFile(args[0]);

Enter fullscreen mode Exit fullscreen mode

Top comments (0)