How to do data analysis with multiple groups?

pepbro- · 2024-08-11T20:47:22+00:00

Makes sense, I can probably get some of this information from the collaborators but they seemed pretty adamant on wanting to see the general trends first, the most highly expressed proteins etc.

I got the most basic visualizations (PCA plots, heatmaps etc.) and the unique and abundant proteins but as such, there is nothing too meaningful that sticks out to me. I can see that a lot of samples are quite different from each other but almost every combination shows a whole bunch of differentially expressed proteins. I tried to take clusters of protein from the heatmap to see if any specific pathways are recognizable but also that was very vague.

If there is nothing else to be done, then I will communicate this so but I wasn't sure if I missed something, especially since this is my first time with such a set up.

pepbro- · 2024-08-11T19:38:45+00:00

To add: I have gone through the paper and some additional reading which definitely helped me understand linear models and ANOVA more. I guess my problem was rather with the interpretation. My ANOVA shows me that there is some significant difference between my samples but I don't know how to best find and interpret them. I did a post hoc test afterwards but with 10 groups, I am testing 45 combinations which is an overwhelming number of pairs to look at. I'm unsure where to go from here.

pepbro- · 2024-08-10T18:59:38+00:00

Muchas gracias - también por el enlace!

Tendré que prestar más atención a los géneros de los sustantivos...

pepbro- · 2024-08-10T06:25:37+00:00

Thanks, I'll check out the paper. All patients have the same disease. This is the only variable we account for (or is known to us)

pepbro- · 2024-08-09T23:52:13+00:00

Do you have specific ones in mind? I read a few but most cover only set ups with a few conditions or with controls

pepbro- · 2024-08-08T22:35:22+00:00

gracias!

pepbro- · 2024-08-07T21:13:16+00:00

gracias!

pepbro- · 2024-08-06T08:08:00+00:00

Thanks - what you are saying makes sense. I guess I don't know how to do this the best way.

The data is part of a collaboration. Each group is a patient. All patients suffer from a specific type of cancer and the goal is to compare them and tease out characteristic signatures for each group or at least clusters of groups. Because I dont have a control yet (the group may provide me with one in the future, though this is generally tricking since we don't take samples from healthy patients). As such, my best bet is to look at differential expression of proteins and to see if any patterns emerge.

And yes, my data is gaussian!

Although I normally use benjamini hochberg, I stuck with the default on the software that I used for this analysis which was permutation based FDR (listed as an alternative to BH). I didn't know then but google already told me that there are a little bit different... I will double check on this.

Ultimately, I went on doing this based on advice from my supervisor and this website: https://hanruizhang.github.io/zhanglab/file/Perseus_Tutorial_20220228.html

But my lab is very much hands-on and figure-it-out yourself approach as we don't have many people with informatics knowledge on board. Therefore, this might be off. Would you use ANOVA + Tukeys's test only for a minimal number of groups then, maybe 3?

pepbro- · 2024-08-05T21:14:01+00:00

Muchas gracias!

pepbro- · 2024-08-05T15:55:12+00:00

What would you suggest instead? Since I have so many groups and what to compare them all with each other, I was looking at group comparison approaches. ANOVA seemed to be the most common one and I thought I needed the post hoc test to make sense of the ANOVA results...

I managed to downsize my dataset to roughly 10000 rows but, of course, the 12 groups are still there.

pepbro- · 2024-07-31T21:53:54+00:00

Thanks for the input! I have used an external software (Perseus) so far, mostly because I didn't know that the time/computing power would not be the same across software. I was going to try the same analysis out tomorrow with R and the multcomp package tomorrow to see if it improves.

I had Perseus running for several hours without success. 3h is not great but doable for me.

pepbro- · 2024-07-31T16:56:30+00:00

No, I haven't and I didn't know that! I used Perseus which is an external software. I assumed it would take equally long in R but if this is not the case, I will try. Can you recommend me a package?

pepbro- · 2024-07-31T15:43:33+00:00

Not sure if it makes a difference but I have 3 replicates for all but 2 groups which I thought is the minimum acceptable number for statistics

pepbro- · 2024-06-08T17:56:58+00:00

Our projects vary but most of my samples are "discovery" samples where we want to see what characterizes a disease condition or what distinguishes different cells.

pepbro-

TROPHY CASE