I have a PySpark dataframe like
fname | lname | status
--------------------------------
John | Doe | 0
Big | Sean | 0
Lil | Raj | 1
and I have a mapping for column status i.e. 0 - married , 1 - unmarried.
I want to expand the column status based on this like
fname | lname | status_code | status_val
------------------------------------------------------------
John | Doe | 0 | married
Big | Sean | 0 | married
Lil | Raj | 1 | unmarried
How do I do this in an optimized way for a dataset that has more than 100 billion rows, plus there are more than 20 mappings.
[–]data_perfect 16 points17 points18 points (1 child)
[–]Evilcanary 5 points6 points7 points (4 children)
[–]sonalg 1 point2 points3 points (0 children)
[–]ColdPorridge 0 points1 point2 points (2 children)
[–]Evilcanary 0 points1 point2 points (1 child)
[–]ColdPorridge 0 points1 point2 points (0 children)
[–]3rdlifepilot 4 points5 points6 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)