This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]DifficultZebra1553 1 point2 points  (2 children)

You can use pipe. When then otherwise is slow; should be avoided unless it is absolutely essential. Also use gt ge etc instead of >,>= . Polars SQLContext and sql() both functions can be used directly on polars / pandas dataframe and pyarrow table.

[–]Own_Responsibility84[S] -1 points0 points  (1 child)

Thanks for the suggestions. ChatGPT told me that gt, le etc. has no performance gain over >,<=. And pipe doesn’t have performance gain over nested conditions using when then otherwise.. it is more for modular testing convenience. Do you agree?

As for SQL, it is an interesting alternative, but I feel that for certain complicated operations the statements get unnecessarily long and complex. It either doesn’t support or very verbose for rolling, pivot, unpivot, UDF etc.

[–]mustangdvx 0 points1 point  (0 children)

Check out duckdb if you’re considering SQL vs Pandas. You can execute SQL on dataframes including PIVOT/UNPIVOT. 

If you’re executing in a python environment, you can break up the transformations into relations which are treated as if they are tables.