DEV Community

Bill Schneider
Bill Schneider

Posted on

1

Learning Scala for Spark, and the apply method

This article originally appeared on my blog

Sometimes in Spark you will see code like

val df1 = ...
val df2 = ...
val df3 = df1.join(df2, df1("col") === df2("col"))
Enter fullscreen mode Exit fullscreen mode

It is a little odd at first to use DataFrame objects like methods.

What's going on here?

In Scala, objects have an apply method, which allows any object to be invoked like a method. obj(foo) is equivalent to obj.apply(foo). DataFrame's apply method is the same as col, so df("col") is equivalent to df.col("col").

This is also related to why you can create instances of case classes without new -- a case class defines a companion object with the same name, and that
companion object has an apply method that returns new ClassName().

Personally I haven't learned to like Scala's apply feature, because it's not entirely obvious what obj(foo) is supposed to do. But in this case,
it makes sense to have shortcuts like that when I'm thinking of Scala as a DSL for Spark.

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →