DEV Community

kambala yashwanth
kambala yashwanth

Posted on

2

Need help dockerizing Spark

Need Help

I have been working on docker,where I have to run the spark application.
I tried using docker repository spark images but ran into issues, so I tried doing my own.

It worked out but every run its downloading spark and i am losing previously ran job logs.

My requirments

  1. Is it possible to have seperate spark image and supply app.jar to it.

  2. Instead of writing logs in docker can I direct it to host file system.

Docker file

FROM alpine

ENV SPARK_VERSION=2.2.0
ENV HADOOP_VERSION=2.7

RUN apk add tar
RUN apk add aria2
RUN mkdir spark
RUN cd spark
WORKDIR /spark



#copy app.properties to docker
COPY app.properties .

# copy /home/exa9/SparkSubmit/App/target/App-0.0.1-SNAPSHOT.jar

ADD target/App-0.0.1-SNAPSHOT.jar app.jar


#Downloading Apache Spark and extracting

RUN aria2c -x16 http://archive.apache.org/dist/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz

RUN apk add --no-cache curl bash openjdk8-jre \

      && tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz





WORKDIR /spark/spark-2.2.0-bin-hadoop2.7/bin
CMD ./spark-submit --class com.Spark.Test.SparkApp.App --master local[*]  /spark/app.jar /spark/app.properties






Top comments (1)

Collapse
 
shawonashraf profile image
Shawon Ashraf • Edited

You can mount a directory as a volume to your container and store the logs there. That way your logs will remain free from side effects. As for the spark re-download issue, you've to find another way to include the spark binary. Since you're writing a Java application, using Maven or Gradle would've made that a lot easier and would've been just a build script away!

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay