I have a pyspark project where I have a dataframe shown below:
+----------+--------------------------------+ | Index | flagArray | +----------+--------------------------------+ | 1 | ['A','S','A','E','Z','S','S'] | +----------+--------------------------------+ | 2 | ['A','Z','Z','E','Z','S','S'] | +--------- +--------------------------------+
I want to represent array elements with its corresponding numeric values.
A - 0 F - 1 S - 2 E - 3 Z - 4
So my output dataframe should look like
+----------+--------------------------------+--------------------------------+ | Index | flagArray | finalArray | +----------+--------------------------------+--------------------------------+ | 1 | ['A','S','A','E','Z','S','S'] | [0, 2, 0, 3, 4, 2, 2] | +----------+--------------------------------+--------------------------------+ | 2 | ['A','Z','Z','E','Z','S','S'] | [0, 4, 4, 3, 4, 2, 2] | +--------- +--------------------------------+--------------------------------+
I have written an udf in pyspark where I am achieving it by writing some if else statements. Is there any better way to handle the same.
Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink.
Hide child comments as well
Confirm
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
I have a pyspark project where I have a dataframe shown below:
I want to represent array elements with its corresponding numeric values.
So my output dataframe should look like
I have written an udf in pyspark where I am achieving it by writing some if else statements. Is there any better way to handle the same.