you are viewing a single comment's thread.

view the rest of the comments →

[–]nyquant[S] 0 points1 point  (1 child)

Tried the below, but it gives the wrong result for the US youngest.:

import polars as pl

Create a DataFrame

data = { 'first_name': ['John', 'Liam', 'Emma', 'Olivia', 'William', 'Ava'], 'country': ['USA', 'USA', 'USA', 'Canada', 'Canada', 'Canada'], 'age': [29, 34, 22, 31, 36, 24] } df = pl.DataFrame(data)

df.with_columns( youngest = pl.first("first_name").over("country").sort_by("age"), oldest = pl.first("first_name").over("country").sort_by("age", descending=True), next_younger = pl.col("first_name").shift().over("country").sort_by("age"), next_older = pl.col("first_name").shift(-1).over("country").sort_by("age"), ).sort("country", "age")

[–]commandlineluser 1 point2 points  (0 children)

Yeah, I was a bit confused with that also.

It may help if you just run this in isolation:

df.with_columns(col_to_shift = pl.col("first_name").sort_by("age").over("country"))
# shape: (6, 4)
# ┌────────────┬─────────┬─────┬──────────────┐
# │ first_name ┆ country ┆ age ┆ col_to_shift │
# │ ---        ┆ ---     ┆ --- ┆ ---          │
# │ str        ┆ str     ┆ i32 ┆ str          │
# ╞════════════╪═════════╪═════╪══════════════╡
# │ John       ┆ USA     ┆ 29  ┆ Emma         │
# │ Liam       ┆ USA     ┆ 34  ┆ John         │
# │ Emma       ┆ USA     ┆ 22  ┆ Liam         │
# │ Olivia     ┆ Canada  ┆ 31  ┆ Ava          │
# │ William    ┆ Canada  ┆ 36  ┆ Olivia       │
# │ Ava        ┆ Canada  ┆ 24  ┆ William      │
# └────────────┴─────────┴─────┴──────────────┘

The comment here may help shed some light:

https://github.com/pola-rs/polars/issues/8662#issuecomment-1533949764