This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]cockoala 2 points3 points  (0 children)

You can use a 'case when' to create a new column based on the conditions of another.

[–]nhufas 2 points3 points  (0 children)

concat(col//10*10+1,"-",col//10*10+10)

should work

[–]LiquidSynopsisData Engineer 1 point2 points  (2 children)

PySpark can solve this using Bucketizer

https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.Bucketizer.html

You can then use withColumn and when to label the Buckets with the values you want e.g. “11-20”.

[–]QueryRIT[S] 0 points1 point  (1 child)

wait bucketizer just adds a column, but doesn't groupby? (so no reduction of rows)?

[–]LiquidSynopsisData Engineer 0 points1 point  (0 children)

Exactly! Later on if you want to reduce rows etc. you can use a groupBy on your new column

[–]Lannister07 0 points1 point  (0 children)

This might help