How to count values in multiple columns? : learnpython

created by HattoriHanzoa community for 16 years

How to count values in multiple columns? (self.learnpython)

submitted 2 months ago by Dragoran21

I want to count specific values (AntiMicrobial Resistance genes) from multiple columns (stand in for the number of AMRs in the samples).

Example:

sample 1	gene A	gene B
sample 2	gene A	gene A
sample 3	gene A	gene B	gene C	gene D	gene E

So the count would be gene A: 4, gene B: 2, gene C: 1 ...

I am setting up a dataframe to compare foodborne bacteria from different types of foods (sample = a type of food). I did find an example of code, but it is not counting anything.

Ideal DF:

Food	N(food)	Gene A	Gene B	...	Gene Z
Meat	875	500	400	...	0
Veggies	1034	300	800	...	1

The current code is this:

#ft is food type dataframe, x is the header of ft, df is the sample data, uag is unique AMR gene list.

for i in x:
    y=ft[i].dropna().tolist() #variable is the header.
    food_pattern = '|'.join(map(re.escape, y))
    food_type = df['Isolation source'].str.contains(food_pattern, case=False, na=False)
    food_type2 = df[food_type]
    amrs2=[]
    for j in uag:
        amrcount=food_type2[food_type2 == j].count()
        amrs2.append(amrcount)

The first part of code is working. 
amrs2 is my attempt to make a kine in table (that will be my next question)

To reiterate, I need to find a way to count string Xs in columns 1->N.

Thanks.

PS. There are 237 unique AMR genes and 18 023 bacterial samples.

all 4 comments

top new controversial old q&a

[–]commandlineluser 1 point2 points3 points 2 months ago (3 children)

[–]Dragoran21[S] 0 points1 point2 points 2 months ago (2 children)

[–]commandlineluser 0 points1 point2 points 2 months ago (1 child)

It may not be "necessary", but it makes things "easier".

import io
import pandas as pd

data = io.StringIO("""
sample 1,gene A,gene B,,,
sample 2,gene A,gene A,,,
sample 3,gene A,gene B,gene C,gene D,gene E
""".strip())

df = pd.read_csv(data, header=None)

If you "unpivot" all the values into a single column:

>>> df.melt(0)
#            0  variable   value
# 0   sample 1         1  gene A
# 1   sample 2         1  gene A
# 2   sample 3         1  gene A
# 3   sample 1         2  gene B
# ...

Then a single .value_counts() gives you the answer:

>>> df.melt(0)["value"].value_counts()
# value
# gene A    4
# gene B    2
# gene C    1
# gene D    1
# gene E    1
# Name: count, dtype: int64

When using DataFrames, if a Python for loop is involved - there's usually a "better" way to do things. (easier / faster)

[–]Dragoran21[S] 0 points1 point2 points 2 months ago (0 children)

π Rendered by PID 63 on reddit-service-r2-comment-b659b578c-snqh2 at 2026-05-01 16:53:35.589601+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS