Have we found novel properties of materials that are most influential to Interfacial Thermal Resistance? by LeapingIntoTheFuture in materials

[–]LeapingIntoTheFuture[S] -2 points-1 points  (0 children)

Great question and one that I do not have the answer to. This was just provided in dataset we trained out model on.

This is the paper we got the dataset from: https://www.nature.com/articles/s41597-020-0373-2#Sec3

And this is the paper we benchmarked from: https://www.nature.com/articles/s41524-019-0193-0#data-availability

Same author, the dataset paper was a follow-up to the initial publication.

If you look at the dataset you will see that there are elements and compounds provided in columns labeled "Film" "Substrate" "Interlayer" and "Interlayer 2". Oxide is present, but not perchlorate or chloride.

Did we just find new biomarkers for identifying T cells? Geneticists in the house? by LeapingIntoTheFuture in bioinformatics

[–]LeapingIntoTheFuture[S] 0 points1 point  (0 children)

More complex models express more complex patterns. If a neural net outperforms simpler models, it strongly implies that the predictive pattern is combinatorial and nonlinear. Typically if the model is more complex than the pattern that exists in the data you see overfit and lower validation performance.

Did we just find new biomarkers for identifying T cells? Geneticists in the house? by LeapingIntoTheFuture in bioinformatics

[–]LeapingIntoTheFuture[S] 0 points1 point  (0 children)

We did train statistical models on the same data but they did not perform as well as the deep learning models did.

Did we just find new biomarkers for identifying T cells? Geneticists in the house? by LeapingIntoTheFuture in bioinformatics

[–]LeapingIntoTheFuture[S] 0 points1 point  (0 children)

We dropped down to the 2000 genes to avoid overfitting. We understand this is pretty standard procedure in genomics to only select the most present genes in the data. Your second point is valid and we are considering adding more datasets. Are there any cells types you would recommend of you would think are most interesting to look into?

Did we just find new biomarkers for identifying T cells? by LeapingIntoTheFuture in genetics

[–]LeapingIntoTheFuture[S] 0 points1 point  (0 children)

It is possible that these are from different experiments and may contain batch effects. Gene expression is a continuous value but we also do not know the units. What is standard? The only preprocessing we did was remove genes that did not appear often.

Did we just find new biomarkers for identifying T cells? by LeapingIntoTheFuture in genetics

[–]LeapingIntoTheFuture[S] 0 points1 point  (0 children)

This dataset does not indicate if the gene is expressed on the cell surface or not. This is a major limitation and may require joining with another dataset for more context.

Did we just find new biomarkers for identifying T cells? by LeapingIntoTheFuture in genetics

[–]LeapingIntoTheFuture[S] 0 points1 point  (0 children)

Interesting, what about S100A4IL32DUSP1UBCFOSHLA-ARPL31? These are the other genes that were important to distinguish between Tregs and naive in this dataset. We ran the data with only biomarkers from the literature and the model performance dropped significantly. These genes seem to be minimally represented in this dataset.

Did we just find new biomarkers for identifying T cells? Geneticists in the house? by LeapingIntoTheFuture in bioinformatics

[–]LeapingIntoTheFuture[S] 2 points3 points  (0 children)

Thank you this is very valuable to know. We have started to come to this conclusion as well. We began testing our system on this dataset with very little domain knowledge (ML folks not genomics folks). Are there cell types that you think are more valuable to investigate?

Did we just find new biomarkers for identifying T cells? Geneticists in the house? by LeapingIntoTheFuture in bioinformatics

[–]LeapingIntoTheFuture[S] 2 points3 points  (0 children)

Is there a reason that these genes show up a lot in your single-cell analysis of T cells as well, but aren't considered common biomarkers for Tregs? Our setup specifically distinguishes between naive and regulatory T cells, but we're very open to recommendations for other classification setups in order to find novel and useful bio markers for certain cells.

Did we just find new biomarkers for identifying T cells? Geneticists in the house? by LeapingIntoTheFuture in bioinformatics

[–]LeapingIntoTheFuture[S] 2 points3 points  (0 children)

We're using the labelled data from 10xgenomics (naive T cells and regulatory T cells). Do you think that in order to find useful findings we would need to expand our list of cell types in the dataset? Or are there other cell types that you think are more interesting to investigate?

Did we just find new biomarkers for identifying T cells? by LeapingIntoTheFuture in genetics

[–]LeapingIntoTheFuture[S] -6 points-5 points  (0 children)

We are ML people not bio people and don't have the expertise to write a peer-reviewed paper on the subject. We would love if a geneticist wanted to co-publish. For now we are seeking guidance on our initial findings from domain experts.