Need Help
I have been working on docker,where I have to run the spark application.
I tried using docker repository spark images but ran into issues, so I tried doing my own.
It worked out but every run its downloading spark and i am losing previously ran job logs.
My requirments
Is it possible to have seperate spark image and supply app.jar to it.
Instead of writing logs in docker can I direct it to host file system.
Docker file
FROM alpine
ENV SPARK_VERSION=2.2.0
ENV HADOOP_VERSION=2.7
RUN apk add tar
RUN apk add aria2
RUN mkdir spark
RUN cd spark
WORKDIR /spark
#copy app.properties to docker
COPY app.properties .
# copy /home/exa9/SparkSubmit/App/target/App-0.0.1-SNAPSHOT.jar
ADD target/App-0.0.1-SNAPSHOT.jar app.jar
#Downloading Apache Spark and extracting
RUN aria2c -x16 http://archive.apache.org/dist/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz
RUN apk add --no-cache curl bash openjdk8-jre \
&& tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
WORKDIR /spark/spark-2.2.0-bin-hadoop2.7/bin
CMD ./spark-submit --class com.Spark.Test.SparkApp.App --master local[*] /spark/app.jar /spark/app.properties
Top comments (1)
You can mount a directory as a volume to your container and store the logs there. That way your logs will remain free from side effects. As for the spark re-download issue, you've to find another way to include the spark binary. Since you're writing a Java application, using Maven or Gradle would've made that a lot easier and would've been just a build script away!