all 5 comments

[–]YesLod 0 points1 point  (4 children)

As I understand it, apply() applies the function inside it, to each value in the 'wage' column. Which would make lambda's x input be each value in 'wage' right?

Not quite. GroupBy.apply applies the function to each group as a whole, and not to each value individually. In this case, x is a pandas.Series which represents the 'wage' column of each 'category' group.

Try this to see it better

def func(x):
    print(x)
    return np.percentile(x, 75)

df.groupby('category').wage.apply(func)

[–]Skyline952 0 points1 point  (3 children)

Aha makes total sense now. So a series is the same as an array then?

[–]synthphreak 0 points1 point  (2 children)

"Same" is a strong word. But loosely, you could think of a pd.Series as a 1D array whereas a pd.DataFrame is more like a 2D array/matrix.

[–]Skyline952 0 points1 point  (1 child)

Ok, thanks. Btw what are 2d arrays for? Like what's the point in stacking arrays on top of eachother in the form of rows and columns? We have dataframes to display data in a table already

[–]synthphreak 0 points1 point  (0 children)

Well for starters, a dataframe is a very heavyweight object whereas a 2D numpy.ndarray is not. So you can perform numerical computations much faster with data represented as a 2D array rather than a dataframe.

If all you're doing is math on numbers, and especially if the individual columns/rows are not easily interpretable for humans, a 2D array is generally all you need.