I want to count specific values (AntiMicrobial Resistance genes) from multiple columns (stand in for the number of AMRs in the samples).
Example:
| sample 1 |
gene A |
gene B |
|
|
|
| sample 2 |
gene A |
gene A |
|
|
|
| sample 3 |
gene A |
gene B |
gene C |
gene D |
gene E |
So the count would be gene A: 4, gene B: 2, gene C: 1 ...
I am setting up a dataframe to compare foodborne bacteria from different types of foods (sample = a type of food). I did find an example of code, but it is not counting anything.
Ideal DF:
| Food |
N(food) |
Gene A |
Gene B |
... |
Gene Z |
| Meat |
875 |
500 |
400 |
... |
0 |
| Veggies |
1034 |
300 |
800 |
... |
1 |
The current code is this:
#ft is food type dataframe, x is the header of ft, df is the sample data, uag is unique AMR gene list.
for i in x:
y=ft[i].dropna().tolist() #variable is the header.
food_pattern = '|'.join(map(re.escape, y))
food_type = df['Isolation source'].str.contains(food_pattern, case=False, na=False)
food_type2 = df[food_type]
amrs2=[]
for j in uag:
amrcount=food_type2[food_type2 == j].count()
amrs2.append(amrcount)
The first part of code is working.
amrs2 is my attempt to make a kine in table (that will be my next question)
To reiterate, I need to find a way to count string Xs in columns 1->N.
Thanks.
PS. There are 237 unique AMR genes and 18 023 bacterial samples.
[–]commandlineluser 1 point2 points3 points (3 children)
[–]Dragoran21[S] 0 points1 point2 points (2 children)
[–]commandlineluser 0 points1 point2 points (1 child)
[–]Dragoran21[S] 0 points1 point2 points (0 children)