Need help dockerizing Spark

#help #docker #devops

Need Help

I have been working on docker,where I have to run the spark application.
I tried using docker repository spark images but ran into issues, so I tried doing my own.

It worked out but every run its downloading spark and i am losing previously ran job logs.

My requirments

Is it possible to have seperate spark image and supply app.jar to it.
Instead of writing logs in docker can I direct it to host file system.

Docker file

FROM alpine

ENV SPARK_VERSION=2.2.0
ENV HADOOP_VERSION=2.7

RUN apk add tar
RUN apk add aria2
RUN mkdir spark
RUN cd spark
WORKDIR /spark



#copy app.properties to docker
COPY app.properties .

# copy /home/exa9/SparkSubmit/App/target/App-0.0.1-SNAPSHOT.jar

ADD target/App-0.0.1-SNAPSHOT.jar app.jar


#Downloading Apache Spark and extracting

RUN aria2c -x16 http://archive.apache.org/dist/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz

RUN apk add --no-cache curl bash openjdk8-jre \

      && tar -xvzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz





WORKDIR /spark/spark-2.2.0-bin-hadoop2.7/bin
CMD ./spark-submit --class com.Spark.Test.SparkApp.App --master local[*]  /spark/app.jar /spark/app.properties

Top comments (1)

Shawon Ashraf • Sep 6 '19 • Edited

You can mount a directory as a volume to your container and store the logs there. That way your logs will remain free from side effects. As for the spark re-download issue, you've to find another way to include the spark binary. Since you're writing a Java application, using Maven or Gradle would've made that a lot easier and would've been just a build script away!

DEV Community

Need help dockerizing Spark

Top comments (1)

Read next

PHP 8.4 is here ⚡️upgrade your projects today!

Code Generated Architecture Diagram using Azure DevOps

Day 33: Deploying a Three-Tier App on Kubernetes: A Simple Guide

🎈5 AI Coding Tools That Will Change the Way You Develop Forever🎇(You Won't Believe #3!)