When starting Udemy course "Streaming Big Data with Spark Streaming & Scala - Hands On!" by Sundog I was sure that running the first task would be easy.
As I started doing the first steps in the Scala world I realised there is a learning curve of getting to know the common tools like SBT and using IntelliJ.
Installing IntelliJ as IDE for Scala development should be straight forward. However there are weird things going on the latest version. Trying to put everything in one place for others to go through when needed:
- Installed IntelliJ IDEA Community (currently 2020.1.1)
- On first startup Install Spark or later on the "Create Project" window go to Configure->Plugins - look for the Spark plugin and install it.
- When creating the project go to the "Spark" tab and make sure you use Spark 2.12.11(explanation below), latest SBT.
- Create the project.
- Wait until the Indexing process complete. You can see the progress bar on the bottom of the IntelliJ IDE app.
- After project created, right click the root name-> Click 'Add Framework Support...'-> Add Scala.
- right click the src folder, choose Mark Directory as-> Sources root.
- Now all should work. Put this in
src.main.scala
:
object Main {
def main(args: Array[String]): Unit = {
println("'hi")
}
}
right click the document and click "Run".
This code has no dependencies and should run ok.
Adding dependencies
I learned that adding dependencies is not straightforward. SBT uses repositories for downloading the dependencies files and you need to "debug" why dependencies fail.
For example go through the process of adding a dependency with SBT.
add this import statement to the code above:
import org.apache.spark.streaming.StreamingContext
object Main {
def main(args: Array[String]): Unit ={
println("hi");
}
}
When you click Run you should encounter an Error "Error:(1, 12) object apache is not a member of package org
import org.apache.spark.streaming.StreamingContext".
This means the dependency is missing.
Let's simulate a broken dependency package and how we fix it
Say you found this dependency line in some website and you add it to your build.sbt
file:
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.3.8"
- run
reload
- run
update
you should see an error
[error] stack trace is suppressed; run last update for the full output
[error] (update) sbt.librarymanagement.ResolveException: Error downloading org.apache.spark:spark-streaming_2.12:2.3.8
[error] Not found
[error] Not found
[error] not found: /Users/mrot/.ivy2/local/org.apache.spark/spark-streaming_2.12/2.3.8/ivys/ivy.xml
[error] not found: https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.12/2.3.8/spark-streaming_2.12-2.3.8.pom
travel to the maven repo from the url in the last line:
https://repo1.maven.org/maven2/org/apache/spark/
and than travel the directories until you find the correct version you seek.
Than, you can see you need to update the SPARK version of the dependency to at least 2.4.0
Let's change the dependency to:
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.4.0"
You can also use the sbt side window
but I didn't find it as useful as the sbt shell
.
The reason we use Scala version 2.12.11
is because I found most packages support it.
Top comments (0)