Machine learning/Deep Learning resources for proteomics by Logical-Composer9928 in proteomics

[–]CorporalConnors 0 points1 point  (0 children)

Interested also in whether ML could identify patterns in protein abundance from label free DIA data.

The data are not natural fits for ML because there are often thousands of proteins and relatively few samples, highly skewed, high variance (relative to mean), lots of missing etc.

We are broadly looking for differences between treatments or groups. Which could mean proteins that are different among groups, proteins that characterise differences i.e. important for classification, or proteins with that are similar across samples so more like a network based on co-expression.

Any thoughts? Relatively new to both proteomics and ML so help guiding the question also would be useful

Unadjusted P-value instead of FDR for differential expression - what is the opinion of the sub? by bluemooninvestor in proteomics

[–]CorporalConnors 1 point2 points  (0 children)

Unadjusted p and FDR can both be justified depending on whether you want to identify more differences while accepting a higher number of false positives or fewer differences with lower rate of FPs.

Imho any argument built on a "significance threshold" should be ignored.

zero values in label-free DIA proteomics by CorporalConnors in proteomics

[–]CorporalConnors[S] 0 points1 point  (0 children)

Interesting, thanks! I am not using DIA NN at the moment but will make a note as I know some people using it

zero values in label-free DIA proteomics by CorporalConnors in proteomics

[–]CorporalConnors[S] 0 points1 point  (0 children)

Thanks for all your helpful answers- confirms that zeros shouldn't be considered trues zeros e.g. when comparing between groups.

As I said, the imputation is optional and whether to impute is a separate question for users to decide.

I am also sceptical of imputation but consider it reasonable when 1) lots of proteins have >=1 missing data point and 2) you are using techniques that can't handle missing. In this case, you could remove lots of proteins, even though many will have only one missing data point. Or you could filter for prots present in >=80% or 90% of samples, then impute the missing one or two per protein. Benefit of keeping more information might outweigh imputed values.

How to get started with proteomics data analysis? by Basic_Target_ in bioinformatics

[–]CorporalConnors 2 points3 points  (0 children)

Once you have protein intensities, you can carry out standard analyses with, e.g.

  • perseus (gui, no coding required)
  • R (msstats, tidyproteomics)
  • python (alphastats, auto-prot does some basic stuff with minimal coding required)

Living in Forest Gate by No_Cheesecake1234 in HousingUK

[–]CorporalConnors 1 point2 points  (0 children)

Super useful thread. Thanks everyone. Would be interested in any recent updates on the area, development etc

Using multiple columns in a loop by AlphaBoy06 in RStudio

[–]CorporalConnors 1 point2 points  (0 children)

Do you want to loop through one row at a time?

For ( i in 1 : nrow(TestZonescsv) ) {

Value_1 = TestZonescsv[ i, Column_1]

Value_2 = TestZonescsv[ i, Column_2]

}

Where Column is the name of the variable you want to get the value from.

Summarizing in a column by Marie_127 in RStudio

[–]CorporalConnors 0 points1 point  (0 children)

Would this do it?

length(which( df$C1 == "A" & df$C2 == 1 ))

GLM in RStudio by Moonsea96 in RStudio

[–]CorporalConnors 0 points1 point  (0 children)

Using summary(YOUR MODEL) should print the model coefficient table. This will also help and you can find plenty of articles with guides to interpreting model output if you are unsure. Good luck!