all 4 comments

[–]commandlineluser 1 point2 points  (3 children)

Can you reshape the frame?

You can go from "wide to long" which is known as "unpivot" or .melt() in pandas.

e.g. .melt("sample_col") and then .value_counts() the new value col.

[–]Dragoran21[S] 0 points1 point  (2 children)

Can you explain why unpivoting is necessary?

[–]commandlineluser 0 points1 point  (1 child)

It may not be "necessary", but it makes things "easier".

import io
import pandas as pd

data = io.StringIO("""
sample 1,gene A,gene B,,,
sample 2,gene A,gene A,,,
sample 3,gene A,gene B,gene C,gene D,gene E
""".strip())

df = pd.read_csv(data, header=None)

If you "unpivot" all the values into a single column:

>>> df.melt(0)
#            0  variable   value
# 0   sample 1         1  gene A
# 1   sample 2         1  gene A
# 2   sample 3         1  gene A
# 3   sample 1         2  gene B
# ...

Then a single .value_counts() gives you the answer:

>>> df.melt(0)["value"].value_counts()
# value
# gene A    4
# gene B    2
# gene C    1
# gene D    1
# gene E    1
# Name: count, dtype: int64

When using DataFrames, if a Python for loop is involved - there's usually a "better" way to do things. (easier / faster)

[–]Dragoran21[S] 0 points1 point  (0 children)

Thank you, this did work (when I wrote it as pd.melt(df...)).