DEV Community

loading...

Creating transformation abstraction

barak haver
・1 min read

Hi all,

I'm trying to create a base class that knows the structure of my final DF and child classes that implement the logic behind altering the data.

IE, something like:

    from abc import abstractmethod

    from pyspark.sql.functions import struct, udf
    from pyspark.sql.types import StringType


    class Parent:
        def __init__(self):
            self.temp_udf = udf(self.temp.__get__(object), StringType())


        @abstractmethod
        def temp(self, temp_obj):
            pass


        def work(self):
            df1 = spark.read.format("delta").load("...")
            df1 = df1.withColumn("temp", self.temp_udf("id"))
            df1.select("temp").show()


    class Child(Parent):
        def __init__(self):
            super(Child, self).__init__()


        def temp(self, temp_obj):
            return "Child" + temp_object # I'll do some logic here


    c = Child()
    c.work()
Enter fullscreen mode Exit fullscreen mode

​I keep failing due to the following error:

Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

Am I trying to achieve the impossible? Or am I missing something?

Discussion (0)